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Abstract — Opportunistic  spectrum  access  (OSA)  that  allows  sec¬ 
ondary  users  to  independently  search  for  and  exploit  instantaneous 
spectrum  availability  is  considered.  The  design  objective  is  to  max¬ 
imize  the  throughput  of  a  secondary  user  while  limiting  the  prob¬ 
ability  of  colliding  with  primary  users.  Integrated  in  the  joint  de¬ 
sign  are  three  basic  components:  a  spectrum  sensor  that  identifies 
spectrum  opportunities,  a  sensing  strategy  that  determines  which 
channels  in  the  spectrum  to  sense,  and  an  access  strategy  that  de¬ 
cides  whether  to  access  based  on  potentially  erroneous  sensing  out¬ 
comes.  This  joint  design  is  formulated  as  a  constrained  partially 
observable  Markov  decision  process  (POMDP),  and  a  separation 
principle  is  established.  The  separation  principle  reveals  the  op¬ 
timality  of  myopic  policies  for  the  design  of  the  spectrum  sensor 
and  the  access  strategy,  leading  to  closed-form  optimal  solutions. 
Furthermore,  it  decouples  the  design  of  the  sensing  strategy  from 
that  of  the  spectrum  sensor  and  the  access  strategy,  and  reduces 
the  constrained  POMDP  to  an  unconstrained  one.  Numerical  ex¬ 
amples  are  provided  to  study  the  tradeoff  between  sensing  time  and 
transmission  time,  the  interaction  between  the  physical  layer  spec¬ 
trum  sensor  and  the  MAC  layer  sensing  and  access  strategies,  and 
the  robustness  of  the  ensuing  design  to  model  mismatch. 

Index  Terms — Cognitive  radio,  opportunistic  spectrum  access, 
partially  observable  Markov  decision  process  (POMDP). 


I.  Introduction 

OPPORTUNISTIC  spectrum  access  (OSA),  first  envi¬ 
sioned  by  Mitola  [1]  under  the  term  “spectrum  pooling” 
and  then  investigated  by  the  DARPA  XG  program  [2],  has 
recently  received  increasing  attention  due  to  its  potential  for 
improving  spectrum  efficiency.  The  basic  idea  of  OSA  is  to 


Manuscript  received  February  27,  2007;  revised  January  17,  2008.  This  work 
was  supported  in  part  by  the  Army  Research  Laboratory  CTA  on  Communica¬ 
tion  and  Networks  under  Grant  DAAD 19-0 1-2-00 11  and  by  the  National  Sci¬ 
ence  Foundation  under  Grants  CNS-0627090  and  ECS-0622200.  The  mate¬ 
rial  in  this  paper  was  presented  in  part  at  the  IEEE  Asilomar  Conference  on 
Signal,  Systems,  and  Computers,  Asilomar,  CA,  October/  November  2006  and 
the  IEEE  Workshop  on  Signal  Processing  Advances  in  Wireless  Communica¬ 
tions,  Helsinki,  Finland,  June  2007. 

Y.  Chen  was  with  the  Department  of  Electrical  and  Computer  Engineering, 
University  of  California,  Davis,  CA  95616  USA.  She  is  now  with  Cisco  Sys¬ 
tems,  Inc.,  San  Jose,  CA  USA  (e-mail:  yxchen@ece.ucdavis.edu). 

Q.  Zhao  is  with  the  Department  of  Electrical  and  Computer  Engineering,  Uni¬ 
versity  of  California,  Davis,  CA  95616  USA(e-mail:  qzhao@ece.ucdavis.edu). 

A.  Swami  is  with  the  Army  Research  Laboratory,  Adelphi,  MD  20783  USA 
(e-mail:  aswami@arl.army.mil). 

Communicated  by  A.  H0st-Madsen,  Associate  Editor  for  Detection  and 
Estimation. 

Color  versions  of  Figures  1-4  and  6-10  in  this  paper  are  available  online  at 
http  ://ieeexplore .  ieee .  org . 

Digital  Object  Identifier  10. 1109/TIT.2008. 920248 


allow  secondary  users  to  search  for,  identify,  and  exploit  instan¬ 
taneous  spectrum  opportunities  while  limiting  the  interference 
perceived  by  primary  users  (or  licensees). 

In  this  paper,  we  address  the  design  of  OSA  strategies  for  sec¬ 
ondary  users  overlaying  a  slotted  primary  network.  Integrated  in 
the  design  are  three  basic  components:  1)  a  spectrum  sensor  at 
the  physical  (PHY)  layer  that  identifies  instantaneous  spectrum 
opportunities;  2)  a  spectrum  sensing  strategy  at  the  medium  ac¬ 
cess  control  (MAC)  layer  that  specifies  which  channels  in  the 
spectrum  to  sense  in  each  slot;  and  3)  a  spectrum  access  strategy, 
also  at  the  MAC  layer,  that  determines  whether  to  access  the 
chosen  channels  based  on  imperfect  sensing  outcomes.  The  de¬ 
sign  objective  is  to  maximize  the  throughput  of  a  secondary  user 
under  the  constraint  that  the  probability  of  collision  perceived 
by  any  primary  user  is  below  a  predetermined  threshold. 

A.  Fundamental  Design  Tradeoffs 

We  provide  first  an  intuitive  understanding  of  the  fundamental 
tradeoffs  in  the  joint  design  of  the  three  basic  components. 

Spectrum  Sensor:  False  Alarm  Versus  Miss-Detection:  The 
spectrum  sensor  of  a  secondary  user  identifies  spectrum  oppor¬ 
tunities  by  detecting  the  presence  of  primary  signals,  i.e.,  by  per¬ 
forming  a  binary  hypothesis  test.  With  noise  and  fading,  sensing 
errors  are  inevitable:  false  alarms  occur  when  idle  channels  are 
detected  as  busy,  and  miss-detections  occur  when  busy  channels 
are  detected  as  idle.  In  the  event  of  a  false  alarm,  a  spectrum 
opportunity  is  overlooked  by  the  sensor,  and  eventually  wasted 
if  the  access  strategy  trusts  the  sensing  outcome.  On  the  other 
hand,  miss-detections  may  lead  to  collisions  with  primary  users. 
The  tradeoff  between  false  alarm  and  miss-detection  is  captured 
by  the  receiver  operating  characteristic  (ROC)  of  the  spectrum 
sensor,  which  relates  the  probability  of  detection  (PD)  and  the 
probability  of  false  alarm  (PFA)  (see  an  example  in  Fig.  1,  where 
we  consider  an  energy  detector).  The  design  of  the  spectrum 
sensor  and  the  choice  of  the  sensor  operating  point  are  thus  im¬ 
portant  issues  and  should  be  addressed  by  considering  the  im¬ 
pact  of  sensing  errors  on  the  MAC  layer  performance  in  terms 
of  throughput  and  collision  probability.  In  particular,  we  are 
interested  in  the  following  fundamental  question:  which  crite¬ 
rion  should  be  adopted  in  the  design  of  the  spectrum  sensor,  the 
Bayes  or  the  Neyman-Pearson  (NP)?  If  the  former,  how  do  we 
choose  the  risks?  If  the  latter,  how  should  we  set  the  constraint 
on  the  PFA? 

Sensing  Strategy:  Gaining  Immediate  Access  Versus  Gaining 
Information  for  Future  Use:  Due  to  hardware  limitations  and 
the  energy  cost  of  spectrum  monitoring,  a  secondary  user  may 
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Fig.  1.  The  ROC  of  an  energy  detector.  Each  point  on  the  ROC  curve  corre¬ 
sponds  to  a  sensor  operating  characteristic  resulting  from  different  detection 
threshold  of  the  energy  detector,  (e:  probability  of  false  alarm;  6:  probability  of 
miss-detection.). 

not  be  able  to  sense  all  the  channels  in  the  spectrum  simultane¬ 
ously.  A  sensing  strategy  is  thus  needed  for  intelligent  channel 
selection  to  track  the  rapidly  varying  spectrum  opportunities. 
The  purpose  of  a  sensing  strategy  is  twofold:  to  find  idle  chan¬ 
nels  for  immediate  access  and  to  gain  statistical  information  on 
the  spectrum  occupancy  for  better  opportunity  tracking  in  the 
future.  The  optimal  sensing  strategy  should  thus  strike  a  bal¬ 
ance  between  these  two  often  conflicting  objectives. 

Access  Strategy:  Aggressive  Versus  Conservative:  Based  on 
the  imperfect  sensing  outcomes  given  by  the  spectrum  sensor, 
the  secondary  user  needs  to  decide  whether  to  access.  An  ag¬ 
gressive  access  strategy  may  lead  to  excessive  collisions  with 
primary  users  while  a  conservative  one  may  result  in  throughput 
degradation  due  to  overlooked  opportunities.  Whether  to  adopt 
an  aggressive  or  a  conservative  access  strategy  depends  on  the 
operating  characteristic  of  the  spectrum  sensor  and  the  collision 
constraint  at  the  MAC  layer.  Hence,  a  joint  design  of  the  PHY 
layer  spectrum  sensor  and  the  MAC  layer  access  strategy  is  nec¬ 
essary  for  optimality. 

B.  Main  Results 

By  modeling  primary  users’  spectrum  occupancy  as  a 
Markov  chain,  we  establish  a  decision-theoretic  framework  for 
the  optimal  joint  design  of  OS  A  based  on  the  theory  of  par¬ 
tially  observable  Markov  decision  processes  (POMDPs).  This 
framework  captures  the  fundamental  design  tradeoffs  discussed 
above.  Within  this  framework,  the  optimal  OSA  strategy  is 
given  by  the  optimal  policy  of  a  constrained  POMDP. 

While  powerful  in  problem  modeling,  POMDP  suffers  from 
the  curse  of  dimensionality  and  does  not  easily  lend  itself  to 
tractable  solutions.  Constraints  on  a  POMDP  further  compli¬ 
cates  the  problem,  often  demanding  randomized  policies  to 
achieve  optimality.  Our  goal  is  to  develop  structural  results 
that  lead  to  simple  yet  optimal  solutions  and  shed  light  on  the 
interaction  between  the  PHY  and  the  MAC  layers  of  OSA 
networks. 

Single -Channel  Sensing:  We  focus  first  on  the  case  where  the 
secondary  user  can  sense  and  access  one  channel  in  each  slot 


(e.g.,  in  the  case  of  single-carrier  communications).  We  estab¬ 
lish  a  separation  principle  for  the  optimal  joint  design  of  OSA. 
We  show  that  the  joint  design  can  be  carried  out  in  two  steps 
without  losing  optimality:  first  to  choose  a  spectrum  sensor  and 
an  access  strategy  that  maximize  the  instantaneous  throughput 
(i.e.,  the  expected  number  of  bits  that  can  be  delivered  in  the 
current  slot)  under  the  collision  constraint,  and  then  to  choose 
a  sensing  strategy  to  optimize  the  overall  throughput.  As  stated 
below,  the  significance  of  this  separation  principle  is  twofold. 

•  The  separation  principle  reveals  the  optimality  of  myopic 
policies  for  the  design  of  the  spectrum  sensor  and  the  ac¬ 
cess  strategy.  Myopic  policies  aim  solely  at  maximizing 
the  immediate  reward  and  ignore  the  impact  of  the  current 
action  on  the  future  reward.  Hence,  obtaining  myopic  poli¬ 
cies  becomes  a  static  optimization  problem  instead  of  a  se¬ 
quential  decision-making  problem.  While  myopic  policies 
are  rarely  optimal  for  a  general  POMDP,  we  show  that  the 
rich  structure  of  the  problem  at  hand  renders  an  exception. 
As  a  consequence,  we  are  able  to  obtain  an  explicit  design 
of  the  optimum  spectrum  sensor  and  a  closed-form  optimal 
access  strategy.  Moreover,  this  closed-form  optimal  design 
allows  us  to  characterize  quantitatively  the  interaction  be¬ 
tween  the  PHY  layer  spectrum  sensor  and  the  MAC  layer 
access  strategy. 

•  The  separation  principle  decouples  the  design  of  the 
sensing  strategy  from  that  of  the  spectrum  sensor  and 
the  access  strategy.  More  importantly,  the  design  of  the 
sensing  strategy  is  reduced  to  an  unconstrained  POMDP, 
which  admits  deterministic  optimal  policies.  Uncon¬ 
strained  POMDPs  have  been  well  studied,  and  existing 
algorithms  can  be  readily  applied  [3]— [6] . 

We  also  provide  numerical  examples  to  study  design  trade¬ 
offs.  We  will  see  that  miss-detections  are  more  harmful  to  the 
throughput  of  the  secondary  user  than  false  alarms.  The  tradeoff 
study  between  the  spectrum  sensing  time  and  the  data  transmis¬ 
sion  time  indicates  that  the  spectrum  sensor  should  take  fewer 
channel  measurements  as  the  maximum  allowable  probability 
of  collision  increases.  In  other  words,  when  the  collision  con¬ 
straint  is  less  restrictive,  the  secondary  user  can  spend  less  time 
in  sensing,  leaving  more  time  in  a  slot  for  data  transmission. 
Robustness  studies  show  that  the  throughput  loss  due  to  inac¬ 
curacies  in  the  assumed  Markovian  model  parameters  is  small, 
and  more  importantly,  the  probability  of  collision  perceived  by 
the  primary  network  is  not  affected  by  model  mismatch. 

Multichannel  Sensing:  We  then  consider  the  scenario  where 
the  secondary  user  can  sense  and  access  multiple  channels  si¬ 
multaneously  in  each  slot.  We  show  that  the  separation  prin¬ 
ciple  still  holds  if  the  spectrum  sensor  and  the  access  strategy 
are  designed  independently  across  channels.  We  note  that  such 
independent  design  is  suboptimal  since  it  ignores  the  potential 
correlation  among  channel  occupancies.  We  thus  propose  two 
heuristic  approaches  to  exploit  channel  correlation,  one  at  the 
PHY  layer  and  the  other  at  the  MAC  layer.  Simulation  results 
show  that  exploiting  channel  correlation  at  the  PHY  layer  is 
more  effective  than  at  the  MAC  layer. 

We  also  find  that  the  performance  of  the  PHY  layer  spectrum 
sensor  can  improve  over  time  by  incorporating  the  MAC  layer 
sensing  and  access  decisions.  Such  MAC  layer  decisions  pro- 
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vide  information  on  the  evolution  of  the  primary  users’  spec¬ 
trum  occupancy,  from  which  the  a  priori  probabilities  of  the 
hypotheses  employed  by  the  spectrum  sensor  can  be  learned. 
This  finding,  along  with  the  quantitative  characterization  of  the 
impact  of  the  spectrum  sensor  on  the  access  strategy,  illustrates 
the  two-way  interaction  between  the  PHY  and  the  MAC  layers: 
the  necessity  of  incorporating  the  sensor  operating  characteris¬ 
tics  into  the  MAC  design  and  the  benefit  of  exploiting  the  MAC 
layer  information  in  the  PHY  design. 

C.  Related  Work 

Two  types  of  spectrum  opportunities  have  been  considered 
in  the  literature:  spatial  and  temporal.  A  majority  of  existing 
work  on  OSA  focuses  on  exploiting  spatial  spectrum  opportu¬ 
nities  that  are  static  or  slowly  varying  in  time  (see  [7]— [9]  and 
references  therein).  A  typical  example  application  is  the  reuse 
of  locally  unused  TV  broadcast  bands.  In  this  context,  due  to 
the  slow  temporal  variation  of  spectrum  occupancy,  real-time 
opportunity  identification  is  not  as  critical  a  component  as  in 
applications  that  exploit  temporal  spectrum  opportunities,  and 
existing  work  often  assumes  perfect  knowledge  of  spectrum  op¬ 
portunities  in  the  whole  spectrum  at  any  time  and  location. 

The  exploitation  of  temporal  spectrum  opportunities  resulting 
from  the  bursty  traffic  of  primary  users  is  addressed  in  [10]— [13] 
under  the  assumption  of  perfect  sensing.  In  [10],  MAC  proto¬ 
cols  are  proposed  for  an  ad  hoc  secondary  network  overlaying 
a  Global  System  for  Mobile  Communications  (GSM)  cellular 
network.  It  is  assumed  that  the  secondary  transmitter  and  re¬ 
ceiver  exchange  information  on  which  channel  to  use  through 
a  commonly  agreed  control  channel.  Different  from  [10],  op¬ 
timal  distributed  MAC  protocols  developed  in  [11]  can  syn¬ 
chronize  the  hopping  patterns  of  the  secondary  transmitter  and 
receiver  without  the  aid  of  additional  control  channels.  More 
recently,  the  design  of  optimal  spectrum  sensing  and  access 
strategies  in  a  fading  environment  has  been  addressed  under 
an  energy  constraint  in  [12].  In  [13],  access  strategies  for  a 
slotted  secondary  user  exploiting  opportunities  in  an  unslotted 
primary  network  are  considered,  where  a  round-robin  single¬ 
channel  sensing  scheme  is  used.  Modeling  of  spectrum  occu¬ 
pancy  has  been  addressed  in  [14].  Measurements  obtained  from 
spectrum  monitoring  testbeds  demonstrate  the  Makovian  transi¬ 
tion  between  busy  and  idle  channel  states  in  wireless  local-area 
network  (LAN). 

Although  the  issue  of  spectrum  sensing  errors  has  been  in¬ 
vestigated  at  the  PHY  layer  [15]— [19],  cognitive  MAC  design 
in  the  presence  of  sensing  errors  has  received  little  attention.  To 
the  best  of  our  knowledge,  [20]  is  the  first  work  that  integrates 
the  operating  characteristic  of  the  spectrum  sensor  at  the  PHY 
layer  with  the  MAC  design.  A  heuristic  approach  to  the  joint 
PHY-MAC  design  of  OSA  is  proposed  in  [20].  In  this  paper, 
we  establish  a  decision-theoretic  framework  within  which  the 
optimal  joint  design  of  OSA  in  the  presence  of  sensing  errors 
can  be  systematically  addressed  and  the  interaction  between  the 
PHY  and  the  MAC  layers  can  be  quantitatively  characterized. 
Interestingly,  the  separation  principle  developed  in  this  paper 
reveals  that  the  heuristic  approach  proposed  in  [20]  is  optimal. 

For  an  overview  on  challenges  and  recent  developments  in 
OSA,  readers  are  referred  to  [21]. 
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Fig.  2.  The  slot  structure. 


Spectrum  Sensing 
Data  Transmission 


Acknowledgement 


D.  Organization  and  Notation 

This  paper  is  organized  as  follows.  Section  II  describes  the 
network  model  and  the  basic  operations  performed  by  a  sec¬ 
ondary  user  to  exploit  spectrum  opportunities.  In  Section  III, 
we  introduce  the  three  basic  components  of  OSA  and  formulate 
their  joint  design  as  a  constrained  POMDP.  In  Section  IV,  we 
establish  the  separation  principle  for  the  optimal  joint  design  of 
OSA  with  single-channel  sensing.  Section  V  extends  the  sepa¬ 
ration  principle  to  multichannel  sensing  scenarios.  Section  VI 
concludes  this  paper. 

Random  variables  and  their  realizations  are  denoted  by  cap¬ 
ital  and  lower  case  letters,  respectively.  Vectors  are  denoted  by 
boldfaced  letters. 


II.  Network  Model 

Consider  a  spectrum  that  consists  of  N  channels  (e.g.,  dif¬ 
ferent  frequency  bands  or  tones  in  an  orthogonal  frequency-di- 
vision  modulation  (OFDM)  system),  each  with  bandwidth  Bn 
( n  =  1 , ;  TV).  These  N  channels  are  licensed  to  a  slotted 
primary  network.  We  model  the  spectrum  occupancy  as  a  dis¬ 
crete-time  homogenous  Markov  chain  with  2N  states.  Specifi¬ 
cally,  let  Sn(t)  E  {0  (busy),  1  (idle)}  denote  the  occupancy  of 
channel  n  in  slot  t.  The  spectrum  occupancy  state  (SOS),  de¬ 
noted  as  S(t )  =  [Si(£), . . . ,  S]v(£)],  follows  a  Markov  chain 
with  state  space  S  =  {0,1}^.  The  transition  probabilities  of 
the  SOS  are  denoted  as  P^a)  =  Pr{$(f  +  1)  =  s'\S(t)  =  s}. 
Note  that  the  transition  probabilities  are  determined  by  the  dy¬ 
namics  of  the  primary  traffic.  We  assume  that  they  are  known 
and  remain  unchanged  in  T  slots. 

We  consider  a  secondary  ad  hoc  network  whose  users  inde¬ 
pendently  and  selfishly  exploit  instantaneous  spectrum  oppor¬ 
tunities  in  these  N  channels.  At  the  beginning  of  each  slot,1  a 
secondary  user  with  data  to  transmit  chooses  a  set  of  channels 
to  sense.  A  spectrum  sensor  is  used  to  detect  the  states  of  the 
chosen  channels.  Based  on  the  sensing  outcomes,  the  secondary 
user  decides  which  sensed  channels  to  access.  Due  to  hardware 
and  energy  constraints,  we  assume  that  a  secondary  user  can 
sense  and  access  at  most  L  (1  <  L  <  N)  channels  in  a  slot.  At 
the  end  of  the  slot,  the  receiver  acknowledges  each  successful 
transmission.  The  basic  slot  structure  is  illustrated  in  Fig.  2. 

Our  goal  is  to  develop  an  optimal  OSA  strategy  for  the  sec¬ 
ondary  user,  which  sequentially  determines  which  channels  in 
the  spectrum  to  sense,  how  to  design  the  spectrum  sensor,  and 
whether  to  access  based  on  the  imperfect  sensing  outcomes.  The 
design  objective  is  to  maximize  the  throughput  of  the  secondary 

'With  the  knowledge  of  the  slot  length  and  through  sensing  the  transmissions 
of  primary  users,  secondary  users  can  synchronize  to  the  slot  structure.  Further¬ 
more,  the  primary  network  may  broadcast  periodic  beacon  signals  to  keep  its 
own  users  synchronized.  These  beacon  signals  can  be  exploited  by  secondary 
users  for  synchronization. 
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user  during  a  desired  period  of  T  slots  under  the  constraint  that 
the  probability  of  collision  Pn(t)  perceived  by  the  primary  net¬ 
work  in  any  channel  n  and  slot  t  is  capped  below  a  predeter¬ 
mined  threshold  £,  i.e., 

Pn(t )  =  Pr{$n(0  =  1 1  Sn(t)  =  0}  <  C,  Vn,t  (1) 

where  <&„(<)  £  {0  (no  access),  1  (access)}  denotes  the  access 
decision  of  the  secondary  user. 

Remarks: 

1)  We  assume  that  the  transition  probabilities  of  the  SOS 
are  known  or  have  been  learned.  We  take  the  viewpoint 
that  such  statistical  models  of  a  particular  spectrum  region 
should  be  obtained  through  measurements  before  the  de¬ 
ployment  of  secondary  networks.  This  is  for  the  purpose  of 
evaluating  the  potential  gain  or  profit  of  secondary  market 
in  that  spectrum  region.  Such  statistical  models  can  then 
be  made  available  to  secondary  users  to  facilitate  the  de¬ 
sign.  We  are,  however,  aware  that  in  some  scenarios,  sec¬ 
ondary  users  may  have  imperfect  knowledge  of  the  under¬ 
lying  Markovian  model.  In  Section  IV-F,  we  study  the  ro¬ 
bustness  of  the  optimal  OSA  design  to  a  mismatched  Mar¬ 
kovian  model.  For  the  case  where  the  Markovian  model  is 
unknown,  formulations  and  algorithms  for  POMDP  with 
an  unknown  model  exist  in  the  literature  [22]  and  can  be 
applied  to  this  problem. 

2)  We  use  the  conditional  probability  of  collision  Pn(t)  in 
the  design  constraint  and  impose  the  collision  constraint 
on  every  channel  n  and  slot  t.  This  ensures  that  a  primary 
user  experiences  collisions  no  more  than  (  fraction  of  its 
transmission  time  regardless  of  where  and  when  it  trans¬ 
mits.  Note  that  if  the  unconditional  probability  of  colli¬ 
sion  Pr{<Fn(/)  =  1  ,Sn(t)  =  0}  is  adopted,  the  constraint 
depends  on  the  traffic  load  of  primary  users  in  channels 
chosen  by  the  secondary  users;  primary  users  who  have 
light  traffic  load  may  not  be  as  well  protected  as  those  with 
heavy  traffic  load. 

3)  We  assume  that  secondary  users  exploit  spectrum  opportu¬ 
nities  independently  and  selfishly.  That  is,  secondary  users 
do  not  exchange  their  information  on  the  SOS  and  each 
one  aims  to  maximize  its  own  throughput  without  taking 
into  consideration  the  interactions  among  secondary  users. 
This  assumption  is  suitable  for  secondary  ad  hoc  networks 
where  there  is  no  central  coordinator  or  dedicated  control/ 
communication  channel.  The  secondary  network  can  adopt 
a  carrier  sensing  mechanism  to  avoid  collisions  among 
competing  secondary  users  as  detailed  in  [11],  [20].  We 
point  out  that  such  selfish  decisions  may  not  be  optimal  in 
terms  of  network-level  throughput.  Nevertheless,  this  for¬ 
mulation  allows  us  to  focus  on  the  basic  components  of 
OSA  and  highlight  the  interactions  among  them. 

III.  Constrained  POMDP  Formulation 

In  this  section,  we  develop  a  decision-theoretic  framework 
for  the  optimal  joint  design  of  the  three  basic  OSA  components 
based  on  the  theory  of  POMDP.  We  focus  first  on  single-channel 
sensing  ( L  =  1).  Extensions  to  multichannel  sensing  scenarios 
are  detailed  in  Section  V. 
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A.  Spectrum  Sensor 

Suppose  that  channel  n  is  chosen  in  slot  t.  The  spectrum 
sensor  detects  the  presence  of  primary  users  in  this  channel  by 
performing  a  binary  hypothesis  test 

Ho  :  Sn(t)  =  1  (idle) 

vs.  Hi  :  Sn(t )  =0  (busy).  (2) 

Let  ©n(t)  £  {0  (busy),  1  (idle)}  denote  the  sensing  outcome 
(i.e.,  the  result  of  the  binary  hypothesis  test).  The  performance 
of  the  spectrum  sensor  is  characterized  by  the  PFA  en(t)  and  the 
probability  of  miss  detection  (PM)  6n(t) 

en(t)  =  Pr {decide  Hi  \  Ho  is  true} 

=  Pr{0n(t)  =  0  |  Sn(t)  =  1}  (3a) 

8n(t)  =  Prjdecide  Ho  \  Hi  is  true} 

=  Pr{0n(Z)  =  1 1  Sn(t)  =  0}.  (3b) 

Subject  to  the  constraint  that  the  PFA  is  no  larger  than  %(/), 
the  largest  achievable  PD,  denoted  as  P^\naJ^n  (/)),  can  be  at¬ 
tained  by  the  optimal  NP  detector  or  an  optimal  Bayesian  de¬ 
tector  with  a  suitable  set  of  risks  [23,  Sec.  2.2.1].  All  oper¬ 
ating  points  (e,  6)  above  the  best  ROC  curve  P^m ax  are  thus 
infeasible. 

Let 

f\s(n)  ±  {(e,S):0<e<l-S<P^x(e)} 

denote  all  feasible  operating  points  of  the  spectrum  sensor.2  As 
illustrated  in  Fig.  3,  the  best  ROC  curve  achieved  by 

the  optimal  NP  detector  forms  the  upper  boundary  of  the  fea¬ 
sible  set  A«5  (n) .  We  also  note  that  every  sensor  operating  point 
(en,  8n)  below  the  best  ROC  curve  lies  on  a  line  that  connects 
two  boundary  points  and  hence  can  be  achieved  by  randomizing 
between  two  optimal  NP  detectors  with  properly  chosen  con¬ 
straints  on  the  PFA  [23,  Sec.  2.2.2].  For  example,  the  operating 
point  (en,8n)  as  shown  in  Fig.  3  can  be  achieved  by  applying 

the  optimal  NP  detector  under  the  constraint  of  PFA  <  with 

_  (2) 

probability  p  =  ^  €ti(2)  and  the  optimal  NP  detector  under  the 

en  ~en 

constraint  of  PFA  <  €n  with  probability  1  —  p.  Therefore,  the 
design  of  spectrum  sensor  is  reduced  to  the  choice  of  a  desired 
sensor  operating  point  in  A<§  (n) . 

The  design  of  the  optimal  NP  detector  is  a  well- studied 
problem,  which  is  not  the  focus  of  this  paper.  Our  objective 
is  to  define  the  criterion  and  the  constraint  under  which  the 
spectrum  sensor  should  be  designed,  equivalently,  to  find  the 
optimal  sensor  operating  point  (e*  (/),  <5*  (/))  G  A$(n)  to 
achieve  the  best  tradeoff  between  false  alarm  and  miss-detec¬ 
tion.  Note  that  the  optimal  sensor  operating  point  may  vary 
with  time  (see  Section  V-D  for  an  example.) 

B.  Sensing  and  Access  Strategies 

In  each  slot,  a  sensing  strategy  decides  which  channel  in  the 
spectrum  to  sense,  and  an  access  strategy  determines  whether 

2Since  the  two  hypotheses  in  (2)  play  a  symmetric  role,  we  have  assumed, 
without  loss  of  generality,  that  the  PD  is  no  smaller  than  the  PFA,  i.e.,  1  —  6  >  e. 
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Probability  of  False  Alarm  z 


Fig.  3.  Illustration  of  the  set  f\s(n)  of  all  feasible  sensor  operating  points 

(e„, «„).  (fi(0  =  1  -  ^(ej*)).  i  —  1,2.) 

to  access  given  the  sensing  outcome.3  Below  we  illustrate  the 
sequence  of  operations  in  each  slot. 

At  the  beginning  of  slot  7 ,  the  SOS  transits  to  S(t)  = 
[Si  (7), . . . ,  S]\f(t)]  according  to  the  transition  probabilities  of 
the  underlying  Markov  chain.  The  secondary  user  first  chooses 
a  channel  a(t)  £  As  =  {1, ...  ,7V}  to  sense  and  a  feasible 
sensor  operating  point  (ea(7),  Sa(t))  £  A$(a(7)).  It  then  deter¬ 
mines  whether  to  access  <Fa(7)  £  {0  (no  access),  1  (access)} 
by  taking  into  account  the  sensing  outcome  @a(7)  £ 

{0  (busy),  1  (idle)}  provided  by  the  spectrum  sensor 
that  is  designed  according  to  the  chosen  operating  point 
(ea(7),  Sa(t)).  A  collision  with  primary  users  happens  when 
the  secondary  user  accesses  a  busy  channel.  At  the  end  of 
this  slot,  the  receiver  acknowledges  a  successful  transmission 
Ka(t)  £  {0  (no  ACK),  1  (ACK)}.  We  assume  that  the  ACKs 
are  received  without  errors.4 

C.  Constrained  POMDP  Formulation 

The  sequential  decision-making  process  described  above  can 
be  modeled  as  a  POMDP  with  constraint  given  in  (1).  The  un¬ 
derlying  system  of  this  POMDP  is  the  SOS  with  state  space 
S  =  {0, 1}^  and  transition  probabilities  P(s'\s).  We  describe 
below  the  actions,  observations,  and  reward  structure  of  the  re¬ 
sulting  POMDP. 

Action  Space:  The  action  in  the  POMDP  formulation  con¬ 
sists  of  three  parts:  a  sensing  decision  a(t)  £  As,  a  spectrum 
sensor  design  (ea(7),  6a(t))  £  A$(a(7)),  and  an  access  decision 
®a(t)  £  {0?  !}• 

Observation  Space:  As  will  become  clear  later,  optimal 
channel  selection  for  opportunity  tracking  relies  on  the  ex¬ 
ploitation  of  the  statistical  information  on  the  SOS  provided 

3 An  alternative  formulation  of  the  joint  design  is  to  combine  the  spectrum 
sensor  with  the  access  strategy.  In  this  case,  the  access  decision  is  made  directly 
based  on  the  channel  measurements.  It  can  be  readily  shown  that  this  formula¬ 
tion  is  equivalent  to  the  one  adopted  here. 

4Note  that  the  ACK  is  sent  after  the  successful  reception  of  data.  Hence,  the 
channel  over  which  the  ACK  is  transmitted  is  ensured  to  be  idle  in  this  slot. 


by  the  observation  history  of  the  secondary  users.  To  ensure 
synchronous  hopping  in  the  spectrum  without  introducing  extra 
control  message  exchange,  the  secondary  user  and  its  desired 
receiver  must  have  the  same  history  of  observations  so  that 
they  make  the  same  channel  selection  decisions.  Since  sensing 
errors  may  cause  different  sensing  outcomes  at  the  transmitter 
and  the  receiver,  the  acknowledgment  Ka(t )  £  {0, 1}  should 
be  used  as  the  common  observation  in  each  slot. 

Reward :  A  natural  definition  of  the  reward  is  the  number 
of  bits  that  can  be  delivered  by  the  secondary  user,  which  is 
assumed  to  be  proportional  to  the  channel  bandwidth.  Given 
sensing  action  a(t)  and  access  action  T>a(7),  the  immediate  re¬ 
ward  R(t)  can  be  defined  as 

R(t)  =  Ka(t)Ba  =  Sa(t)$a{t)Ba.  (4) 

Hence,  the  expected  total  reward  of  the  POMDP  represents  the 
overall  throughput,  i.e.,  the  expected  total  number  of  bits  that 
can  be  delivered  by  the  secondary  user  in  T  slots. 

Belief  Vector:  Due  to  partial  spectrum  monitoring  and 
sensing  errors,  a  secondary  user  cannot  directly  observe  the 
true  SOS.  It  can,  however,  infer  the  SOS  from  its  decision  and 
observation  history.  As  shown  in  [3],  the  statistical  information 
on  the  SOS  provided  by  the  entire  decision  and  observation  his¬ 
tory  can  be  encapsulated  in  a  belief  vector  A (7)  =  {As(7)}sG§, 
where  A s(t)  £  [0, 1]  denotes  the  conditional  probability  (given 
the  decision  and  observation  history)  that  the  SOS  is  s  in  slot  7 

A s(t)  4  Pr  {S(t)  =  s\  (5) 

where  A(l)  is  the  initial  belief  vector,  i.e.,  the  a  priori  distribu¬ 
tion  of  the  SOS  at  time  7=1,  which  can  be  set  to  the  stationary 
distribution  of  the  underlying  Markov  chain  if  no  information 
on  the  initial  SOS  is  available. 

Policy:  A  joint  design  of  OS  A  is  given  by  policies  of  the 
above  POMDP.  Specifically,  a  sensing  policy  7 rs  specifies  a 
sequence  of  functions  7 rs  =  [ps(  1), . . . ,  ps(T)],  where  ps(t) 
maps  a  belief  vector  A(7)  to  a  channel  a(t)  £  As  to  be  sensed  in 
this  slot.  Since  the  optimal  policy  for  a  finite-horizon  POMDP  is 
generally  nonstationary,  functions  {ps(t)}J=1  are  not  identical. 
A  sensor  operating  policy  7 specifies,  in  each  slot  7,  a  spectrum 
sensor  design  (ea (7),  8a (7))  £  A$( a (7))  based  on  the  current  be¬ 
lief  vector  A(7)  and  the  chosen  channel  a(7).  An  access  policy 
7rc  specifies  an  access  decision  3>a(7)  £  {0, 1}  in  each  slot  7 
based  on  the  current  belief  vector  A(7)  and  the  sensing  outcome 
©a (7)  £  {0, 1}  at  the  chosen  channel  a(t). 

The  above  defined  policies  are  deterministic.  For  uncon¬ 
strained  POMDPs,  there  always  exist  deterministic  optimal 
policies.  For  constrained  POMDPs,  however,  we  may  need  to 
resort  to  randomized  policies  to  achieve  optimality.  A  random¬ 
ized  sensing  policy  7 rs  defines  a  sequence  of  functions,  each 
mapping  a  belief  vector  A(7)  to  a  probability  mass  function 
(pmf)  on  the  set  f\s  of  channels,  and  a  randomized  sensor 
operating  policy  71$  defines  the  mapping  from  A(7)  to  a  prob¬ 
ability  density  function  (pdf)  on  the  set  A$(a(7))  of  feasible 
sensor  operating  points.  A  randomized  access  policy  7 rc  maps 
A(7)  and  sensing  outcome  ©a(7)  to  a  transmission  probability. 
In  other  words,  the  actions  chosen  in  a  randomized  policy 
are  probability  distributions.  Due  to  the  uncountable  space 
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of  probability  distributions,  randomized  policies  are  usually 
computationally  prohibitive. 

Objective  and  Constraint:  We  aim  to  develop  the  optimal 
joint  design  of  OS  A  {7r|,  7r*,  7 r*}  that  maximizes  the  expected 
total  number  of  bits  that  can  be  delivered  by  the  secondary  user 
(i.e.,  the  expected  total  reward  of  the  POMDP)  in  T  slots  under 
the  collision  constraint  given  in  (1) 

"  T 

=  arg  max  E{wSjWsjWo}  ^R{t)  A(l) 

7 TS,7ZS,7ZC  ^ 

s.t.  Pa(t)  =  Pr{<f>a(f)  =  1 1  Sa(t)  =  0}  <  C,  Va,  t  (6) 

where  E  {7^  ^ }  represents  the  expectation  given  that  policies 
{7 r<5, 7rs,  7 rc}  are  employed,  and  Pa(t )  is  the  probability  of  colli¬ 
sion  perceived  by  the  primary  network  in  channel  a(t)  and  slot 
t. 

We  consider  in  (6)  the  nontrivial  case  where  the  conditional 
collision  probability  Pa(t )  is  well  defined,  i.e.,  Pr{S'a(f)  = 
0}  >  0.  Note  that  Pr{Sa(t)  =  0}  =  0  (or  1)  implies  that  the 
system  state  Sa  (t)  is  known  based  on  the  current  belief  vector 
A(£).  In  this  case,  the  optimal  access  decision  is  straightfor¬ 
ward,  and  the  design  of  the  spectrum  sensor  becomes  unnec¬ 
essary  since  the  channel  state  is  already  known. 

IV.  Separation  Principle  for  Optimal  OSA 

In  this  section,  we  solve  the  constrained  POMDP  given  in 
(6)  to  obtain  the  optimal  joint  design  of  OSA.  Specifically,  we 
establish  a  separation  principle  that  reveals  the  optimality  of 
deterministic  policies  and  leads  to  closed-form  optimal  design 
of  the  spectrum  sensor  and  the  access  strategy.  It  also  allows  us 
to  characterize  quantitatively  the  interaction  between  the  PHY 
layer  sensor  operating  characteristics  and  the  MAC  layer  access 
strategy. 

A.  Optimality  Equation 

The  first  step  to  solving  (6)  is  to  express  the  objective  and 
the  constraint  explicitly  as  functions  of  the  actions.  We  establish 
first  the  optimality  of  deterministic  sensing  and  sensor  operating 
policies,  which  significantly  simplifies  the  action  space. 

Optimality  of  Deterministic  Policies:  In  Proposition  1,  we 
show  that  it  is  sufficient  to  consider  deterministic  sensing  and 
sensor  operating  policies  in  the  optimal  joint  design  of  OSA. 

Proposition  1:  For  the  optimal  joint  design  of  OSA  given  by 
(6),  there  exist  deterministic  optimal  sensing  and  sensor  oper¬ 
ating  policies. 

Proof:  The  proof  is  based  on  the  concavity  of  the  best  ROC 
curve  and  the  fact  that  the  collision  constraint  is  imposed  on 
every  channel.  See  details  in  Appendix  A.  II  I  I 


As  a  result  of  Proposition  1,  the  secondary  user  needs  to 
choose,  in  each  slot,  a  channel  a(t)  E  As  to  sense,  a  feasible 
sensor  operating  point  (ea(£),  ba(t))  E  A^(a(f)),  and  a  pair  of 
transmission  probabilities  (/a( 0,  £),  /a(l,  t)),  where 

/aOM)  =  Pr{$a(t)  =  1 1  ea(t)  =  0}e  [0, 1] 

is  the  probability  of  accessing  channel  a(t)  given  sensing  out¬ 
come  ©a(f)  E  {0, 1}  in  the  current  slot.  The  composite  action 
space  is  then  given  by 

A  =  {(a,  (ea,  Sa),  (/a(0),  /a(l))) 

:  a  GflS)  ( ea ,  Sa )  €  A^(a),  (/a( 0),  /a(l))  £  [0>  l]2}-  (7) 

Objective  Function:  Let  Vt(A (£))  be  the  value  function, 
which  represents  the  maximum  expected  reward  that  can  be 
obtained  starting  from  slot  t  (1  <  t  <  T)  given  belief  vector 
A (t).  Given  that  the  secondary  user  takes  action  A(t)  =  A  E  A 
and  observes  acknowledgment  Ka(t )  =  k  E  {0, 1},  the  reward 
that  can  be  accumulated  starting  from  slot  t  consists  of  two 
parts:  the  immediate  reward  R(t )  =  kBa  and  the  maximum 
expected  future  reward  V^+i(A (t  +1)),  where 

A (t  +  1)  =  {As(f  +  l) } =  P (A(f)  |  A,  k) 

represents  the  updated  knowledge  of  the  SOS  after  incorpo¬ 
rating  the  action  A  (t)  =  A  and  the  acknowledgment  Ka  (t)  =  k 
in  slot  t.  Averaging  over  all  possible  states  s  E  S  and  ac¬ 
knowledgment  k  E  {0, 1}  and  then  maximizing  over  all  actions 
A  E  A,  we  arrive  at  the  following  optimality  shown  in  (8a)-(8b) 
at  the  bottom  of  the  page,  where 

A  =  {a,  (ea,  Sa),  (/a(0),  /a(l))}  G  A 

denotes  a  composite  action  taken  in  the  current  slot  t  and 

UA(k\s)  ±  Pi{Ka(t)  =  k\S(t)  =  s} 

is  the  conditional  pmf  of  the  acknowledgment  Ka  ( t )  given  cur¬ 
rent  state  S(t)  =  s  and  action  A(t). 

Noting  that  the  acknowledgment  can  be  written  as  Ka  (t)  = 
we  obtain  its  conditional  pmf  UA(k\s)  as 

UA(l\s) 

=  Pr {Sa(t)  =  1 1  S(t)  =  s}  Pr{$a(£)  =  1 1  S(t)  =  8,  Sa(t )  =  1} 

l 

=  ^=1]  =  6  |  S(t)  =  S}fa(9) 

e=o 

=  sa[cafa(0)  +  (1  -  ea)/a(l)]  (9a) 

UA(0\s) 

=  1  —  UA{l\s)  (9b) 

where  1  is  the  indicator  function  and 

Pr  {Sa(t)  =  1 1  S(t)  =  s}  =  l[Sffl=i] 


(8b) 
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is  given  by  the  occupancy  state  sa  of  channel  a.  Ap¬ 
plying  Bayes’  rule,  we  obtain  the  updated  belief  vector 
A (t  +  1)  =  T (A(t)  |  A,  k)  as 


chosen  channel  a  in  any  slot  t ,  the  optimal  sensor  op¬ 
erating  point  (e*,<5*)  and  transmission  probabilities 
(/a(0),/a(l))  are  given  by 


A s(t  +  1)  — 


Es'gs  K>(t)P(s\s')UA(k\s') 

ZS'es^WA(k\s') 


se  s.  (10) 


We  see  from  (10)  that  by  adopting  the  acknowledgment  Ka(t) 
as  their  observation,  the  transmitter  and  the  receiver  will  have 
the  same  updated  belief  vector  A(t  + 1),  which  ensures  that  they 
tune  to  the  same  channel  in  the  next  slot. 

Note  from  (8)  that  the  action 


A  =  {a,  (ea,  Sa),  (/«( 0),  /a(l))} 

taken  by  the  secondary  user  affects  the  expected  total  reward 
in  two  ways:  it  acquires  an  immediate  reward  R(t )  —  kBa 
and  transforms  the  current  belief  vector  A(t)  to  a  new  one 
A (t  +  1)  =  T (A (t)  |  A,  k)  which  determines  the  future  reward 
T4+i(^(A(f)  |  A,  k)).  Hence,  the  function  of  the  secondary 
user’s  action  is  twofold:  to  exploit  immediate  spectrum  op¬ 
portunities  and  to  gain  information  on  the  SOS  (characterized 
by  belief  vector  A (t  +  1))  so  that  more  rewarding  decisions 
can  be  made  in  the  future.  As  a  consequence,  the  optimal  joint 
design  of  OSA  should  achieve  the  tradeoff  between  these  two 
often  conflicting  objectives.  Myopic  policies  that  aim  solely 
at  maximizing  the  instantaneous  throughput  (i.e.,  the  expected 
immediate  reward)  without  considering  future  consequences 
are  generally  suboptimal. 

Collision  Constraint:  The  collision  probability  Pa(t)  is  de¬ 
termined  by  the  sensor  operating  point  ( ea ,  8a)  and  the  transmis¬ 
sion  probabilities  (/a( 0),  fa(  1)):  see  (11)  at  the  bottom  of  the 
page.  In  principle,  by  solving  (8)  recursively  (starting  from  the 
last  slot  T  using  (8b))  under  the  constraint  of  (11),  we  can  ob¬ 
tain  the  maximum  overall  throughput  V\  (A(l))  of  the  secondary 
user  and  the  corresponding  policies  {7r| ,  7r* ,  7r*  }.  We,  however, 
note  that  (8)  is  generally  intractable  due  to  the  uncountable  ac¬ 
tion  space  A. 


B.  The  Separation  Principle 

Theorem  1:  The  Separation  Principle  for  OSA  with 
Single-Channel  Sensing 

The  joint  design  of  OSA  given  in  (8)  can  be  carried  out  in  two 
steps  without  losing  optimality. 

•  Step  1:  Choose  the  sensor  operating  policy  7Ts  and  the 
access  policy  i rc  to  maximize  the  instantaneous  throughput 
subject  to  the  collision  constraint.  Specifically,  for  any 


{(«),  am  /*(i))> 

=  arg  max  E  [R(t)  |  A(i)] 

(ea,6a)€As(a) 

(ja(0),/ffl(l))e[0,l]2 

=  arg  max  eafa( 0)  +  (1  -  ea)fa(  1)  (12a) 

(f«.A)€A  6(a) 

(/a(o),/ffl(i))e[o,ip 

S-t-  Pa(t)  =  (l-8a)fa(0)  +  8afa(l)<(.  (12b) 


•  Step  2:  Using  the  optimal  sensor  operating  and  access 
policies  {7r|,7 t*}  given  by  (12),  choose  sensing  policy  to 
maximize  the  overall  throughput.  Specifically,  the  optimal 
sensing  policy  7 r*  is  given  by 


7 r 


* 

s 


arg  max 

'K  s 


E^) 


lt=i 


A(l) 


(13) 


Proof:  The  proof  is  based  on  the  convexity  of  the  value 
function  Vt(A(t))  with  respect  to  the  belief  vector  A (t)  and  the 
structure  of  the  conditional  observation  distributions  UA(k\s). 
See  Appendix  B  for  details.  II  I  I 


The  separation  principle  simplifies  the  optimal  joint  design  of 
OSA  in  two  ways.  First,  it  reveals  that  myopic  policies,  rarely 
optimal  for  a  general  POMDP,  are  optimal  for  the  design  of  the 
spectrum  sensor  and  the  access  strategy.  We  can  thus  obtain 
the  optimal  spectrum  sensor  (e*,  8*)  £  A^(a)  and  the  optimal 
transmission  probabilities  (/*(0), /*(1))  E  [0,  l]2  by  solving 
the  static  optimization  problem  given  in  (12).  This  allows  us  to 
characterize  quantitatively  the  interaction  between  the  spectrum 
sensor  and  the  access  strategy  as  given  in  Proposition  2  and  to 
obtain  the  optimal  joint  design  in  closed  form  as  given  in  The¬ 
orem  2.  While  the  proof  is  lengthy,  there  is  an  intuitive  explana¬ 
tion  for  this  apparently  surprising  result.  We  note  that  upon  re¬ 
ceiving  the  ACK  Ka(t )  =  1,  the  secondary  user  knows  exactly 
that  the  chosen  channel  is  idle.  However,  when  Ka(t )  =  0  (no 
packet  is  received),  the  secondary  receiver  cannot  tell  whether 
the  chosen  channel  is  busy  or  not  accessed.  Hence,  Ka(t )  =  1 
provides  the  secondary  user  with  more  information  on  the  cur¬ 
rent  SOS.  We  also  note  that  accessing  the  chosen  channel  maxi¬ 
mizes  not  only  the  instantaneous  throughput  but  also  the  chance 
of  receiving  more  informative  observation  Ka(t)  =  1.  Hence, 
getting  immediate  reward  and  gaining  information  for  more  re¬ 
warding  future  decisions  are  no  longer  conflicting  here. 

Second,  the  separation  principle  decouples  the  design  of 
the  sensing  strategy  from  that  of  the  spectrum  sensor  and  the 


Pa(*)^Pr{$a(i)  =  l|Sa(*)=0} 

1 

=  EPr{0*CO  =  e\ Sa(t)  =  0}  Pr{d>a(£)  =  l|ea(i)  =  B,Sa(t)  =  0} 

0=0 

=  (l-Ufa(0)  +  6a/a(l)<(. 


(11) 


2060 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  54,  NO.  5,  MAY  2008 


Fig.  4.  Illustration  of  conservative  and  aggressive  regions. 


access  strategy,  and  reduces  the  sensing  strategy  from  a  con¬ 
strained  POMDP  (6)  to  an  unconstrained  one  with  finite  action 
space  (13).  This  is  because  the  sensor  operating  points  and  the 
transmission  probabilities  determined  by  (12)  have  ensured 
the  collision  constraint  regardless  of  channel  selections.  The 
optimal  sensing  policy  is  thus  obtained  by  maximizing  the 
overall  throughput  without  any  constraint. 

C.  Interaction  Between  the  PHY  and  the  MAC  Layers 

Before  solving  for  the  optimal  sensor  operating  and  access 
policies,  we  study  the  interaction  between  the  PHY  layer  spec¬ 
trum  sensor  and  the  MAC  layer  access  strategy. 

We  note  that  when  the  spectrum  sensor  at  the  PHY  layer  is 
given,  the  separation  principle  still  holds  for  the  design  of  the 
sensing  and  access  strategies.  The  optimal  access  strategy  for  a 
given  spectrum  sensor  can  thus  be  obtained. 

Proposition  2:  Given  a  chosen  channel  a  and  a  feasible 
sensor  operating  point  (ea,  8a),  the  optimal  transmission  prob- 
abilities  (/*( 0),  /*(  1))  are  given  by 

f  $a  <  C 

am /:(i))  =  <  (o,i),  Sa  =  c  d4) 
l  (0>  6^)’  >  C- 

Proof:  The  proof  is  based  on  the  separation  principle  (12) 
and  the  fact  that  all  feasible  operating  points  lie  above  the  line 
1  —  8  a  =  ea.  See  details  in  Appendix  C.  II  I  I 

As  seen  from  Proposition  2,  randomized  access  policies  are 
necessary  to  achieve  optimality  when  8a  (.  Moreover,  Propo¬ 
sition  2  quantitatively  characterizes  the  impact  of  the  sensor  per¬ 
formance  8 a  on  the  optimal  access  strategy  (/*(0),  /*(1)).  As 
illustrated  in  Fig.  4,  the  set  f\s(a)  of  feasible  sensor  operating 
points  can  be  partitioned  into  two  regions:  the  “conservative” 
region  (8a  >  Q  and  the  “aggressive”  region  {8a  <  O-  When 
8 a  >  C  with  high  probability,  the  spectrum  sensor  detects  a  busy 
channel  as  idle  (i.e.,  a  miss-detection  occurs).  Hence,  the  access 
policy  should  be  conservative  to  ensure  that  the  collision  prob¬ 
ability  is  capped  below  (.  Specifically,  even  when  the  sensing 
outcome  0a(t)  =  1  indicates  an  idle  channel,  the  secondary 
user  should  only  transmit  with  probability  ■£-  <  1.  When  the 


channel  is  sensed  as  busy  Sa(t)  =  0,  the  user  should  always 
refrain  from  transmission.  On  the  other  hand,  when  8a  <  £,  the 
probability  of  false  alarm  is  high;  the  spectrum  sensor  is  likely 
to  overlook  an  opportunity.  Hence,  the  secondary  user  should 
adopt  an  aggressive  access  policy:  always  transmit  when  the 
channel  is  sensed  as  idle  and  transmit  with  probability  >  0 
even  when  the  sensing  outcome  indicates  a  busy  channel.  When 
8 a  =  £,  the  access  policy  is  to  simply  trust  the  sensing  outcome, 
i.e.,  access  if  and  only  if  the  channel  is  sensed  to  be  available 
&a(t)  =  O a(t)-  We  will  show  in  Section  IV-D  that  the  split¬ 
ting  point  8 a  =  (  on  the  best  ROC  curve  ^ax  is  the  optimal 
sensor  operating  point. 

Similar  to  Proposition  2,  we  can  quantitatively  study  the  im¬ 
pact  of  the  access  strategy  on  the  spectrum  sensor  design  by 
solving  (12)  for  the  optimal  sensor  operating  points  when  the 
transmission  probabilities  are  given.  This  result  is  omitted  to 
avoid  unnecessary  repetition.  Details  can  be  found  in  [25]. 

D.  Optimal  Joint  Design  of  Spectrum  Sensor  and  Access  Policy 

Optimizing  (14)  over  all  feasible  sensor  operating  points,  we 
obtain  an  explicit  optimal  design  for  the  spectrum  sensor  and  a 
closed-form  deterministic  optimal  access  policy  in  Theorem  2. 

Theorem  2:  For  any  chosen  channel  a  in  any  slot,  the  op¬ 
timal  sensor  should  adopt  the  optimal  NP  detector  with  con¬ 
straint  8*  =  (  on  the  PM.  Correspondingly,  the  optimal  access 
policy  is  to  trust  the  sensing  outcome  given  by  the  spectrum 
sensor,  i.e.,  /*( 0)  =  0  and  /*(1)  =  1. 

Proof:  The  proof  of  Theorem  2  exploits  the  convexity  of 
the  set  A«5  of  feasible  sensor  operating  points,  which  follows 
directly  from  the  concavity  of  the  best  ROC  curve  [23].  See 
Appendix  D  for  details.  I  I  I  I 

We  find  that  the  optimal  sensor  operating  point  coincides  with 
the  splitting  point  8*  =  (  of  the  “conservative”  region  and  the 
“aggressive”  region  on  the  best  ROC  curve  (see  Fig.  4).  This 
indicates  that  at  8*  =  (,  the  best  tradeoff  between  false  alarm 
and  miss-detection  is  achieved  and  the  access  policy  does  not 
need  to  be  conservative  or  aggressive.  We  thus  have  a  simple  and 
deterministic  optimal  access  policy:  trust  the  sensing  outcome. 
Summarized  below  are  the  properties  of  the  optimal  sensor  op¬ 
erating  and  access  policies  given  in  Theorem  2. 
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Properties  1:  The  optimal  spectrum  sensor  design  and  the 
optimal  access  policy  are  as  follows. 

P 1 . 1  time- invariant  and  belief-independent. 

P 1 .2  model-independent. 

As  a  result  of  P  1.1,  the  spectrum  sensor  can  be  configured 
off-line,  and  there  is  no  need  to  calculate  and  store  the  op¬ 
timal  transmission  probabilities,  leading  to  significant  reduction 
in  both  implementation  complexity  and  memory  requirement. 
The  second  property  is  that  the  optimal  design  of  the  spectrum 
sensor  and  the  access  strategy  does  not  require  the  knowledge 
of  the  transition  probabilities  of  the  underlying  Markov  process. 
Since  the  probability  of  collision  (11)  is  solely  determined  by 
the  sensor  operating  and  access  policies,  PI. 2  indicates  that  the 
collision  constraint  on  the  joint  OSA  design  can  be  ensured  re¬ 
gardless  of  the  accuracy  of  the  Markovian  model  used  by  the 
secondary  user.  In  other  words,  the  primary  network  is  not  af¬ 
fected  by  the  inaccurate  model  adopted  by  the  secondary  user. 
Model  mismatch  only  affects  the  performance  of  the  secondary 
user  (see  Fig.  8  for  a  simulation  example). 

E.  Optimal  Sensing  Policy 

As  revealed  by  the  separation  principle,  the  optimal  sensing 
policy  can  be  obtained  by  solving  an  unconstrained  POMDP 
with  finite  action  space  f\s .  Specifically,  by  applying  the  optimal 
spectrum  sensor  design  and  the  optimal  access  policy  given  in 
Theorem  2  to  (8),  we  simplify  the  optimality  equation  as  shown 
in  (15a)-(15b)  at  the  bottom  of  the  page.  By  applying  f*  (0)  =  0 
and  /*(  1)  =  1  to  (9),  we  obtain  the  conditional  observation 
probability  Ua(k\s)  as 

Ua(l\s)  =  sa(l  -  el),  ua(0\s)  =  1  -  Ua(l\s)  (16) 

where  6*  is  the  PFA  associated  with  the  PD  1  —  6*  =  1  — 
(  on  the  best  ROC  curve  The  updated  belief  vector 

T(A(t)  |  a ,  k)  can  be  obtained  by  using  (10)  with  UA(k\s)  re¬ 
placed  by  Ua(k\s)  in  (16). 

It  is  shown  in  [3]  that  the  value  function  of  an  unconstrained 
POMDP  with  finite  action  space  is  piece-wise  linear  and  can 
be  solved  via  linear  programming.  We  can  thus  use  the  existing 
computationally  efficient  algorithms  [4]— [6]  to  solve  (8)  for  the 
optimal  sensing  policy. 

Although  myopic  sensor  operating  and  access  policies  are 
shown  to  be  optimal  for  the  joint  design  of  OSA  (see  the  sep¬ 
aration  principle),  myopic  sensing  policy  is  suboptimal  in  gen¬ 
eral.  Interestingly,  it  has  been  shown  in  [24]  that,  when  the  SOS 
evolves  independently  and  identically  across  channels,  the  my¬ 
opic  sensing  policy  is  optimal  and  has  a  simple  and  robust  struc¬ 
ture  that  obviates  the  need  for  knowing  the  transition  probabili¬ 


ty 


Fig.  5.  The  Markov  channel  model. 


ties.  When  the  channel  occupancy  states  are  correlated,  the  my¬ 
opic  approach  can  serve  as  a  suboptimal  solution  with  reduced 
complexity. 

F.  Numerical  Examples 

Here  we  provide  numerical  examples  to  study  different  fac¬ 
tors  that  affect  the  optimal  joint  design  of  OSA.  We  consider 
N  =  3  channels,  each  with  bandwidth  Bn  —  1.  While  the 
separation  principle  applies  to  arbitrarily  correlated  SOS,  we 
consider  here  the  case  where  the  SOS  evolves  independently 
but  not  identically  across  these  three  channels  for  simplicity.  In 
this  case,  the  SOS  dynamics  can  be  characterized  by  the  tran¬ 
sition  probabilities  a  =  [gi,g2,g3]  and  /3  =  [/?i , /32, /?3], 
where  an  denotes  the  probability  that  channel  n  transits  from 
state  0  (busy)  to  state  1  (idle),  and  0n  denotes  the  probability 
that  channel  n  stays  in  state  1  (see  Fig.  5).  In  all  examples, 
the  transition  probabilities  are  given  by  a  =  [0.2, 0.4, 0.6]  and 

=  [0.8, 0.6, 0.4].  The  horizon  length  is  T  =  10  slots,  and 
the  maximum  allowable  probability  of  collision  is  £  =  0.05. 
We  use  the  normalized  overall  throughput  Vi(A(l))/T,  where 
A(l)  is  the  stationary  distribution  of  the  SOS,  to  evaluate  the 
performance  of  the  optimal  OSA  design. 

To  illustrate  the  interaction  between  the  PHY  layer  spectrum 
sensor  and  the  MAC  layer  access  policy,  we  consider  a  simple 
spectrum  sensing  scenario  where  the  background  noise  and  the 
primary  signal  are  modeled  as  white  Gaussian  processes.  Let 
a2  0  and  a2  x  denote,  respectively,  the  noise  and  the  primary 
signal  power  in  channel  n.  At  the  beginning  of  each  slot,  the 
spectrum  sensor  takes  M  independent  measurements  Yn  = 

j  , . . . ,  Y^m]  from  chosen  channel  n  and  performs  the  fol¬ 
lowing  binary  hypothesis  test: 


7~to(Sri  =  1)  :Yn  ^  Af(Q 

vs.  Wi(5n  =  0):yn~V(0M,«1+<o)/M)  (17) 

where  JV(0m^2Im)  denotes  the  M-dimensional  Gaussian 
distribution  with  identical  mean  0  and  variance  a2  in  each  di¬ 
mension.  An  energy  detector  is  optimal  under  the  NP  criterion 
[23,  Sec.  2.6.2] 

M 

\\Ynh=J2Yn,i<H0Vn.  (18) 

7-1 


Vt(A(t))  =  maxT\s(t)Y/Ua(k\s)[kBa  +  Vt+1(T(A(t)\a,km  l<t<T  (15a) 

ses  k= 0 

Vt(A(T))  =  max  V  Xs(t)Ua(l\s)Ba.  (15b) 
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Fig.  6.  The  impact  of  sensor  operating  characteristics  on  the  performance  of 
the  optimal  OSA  design. 


The  PFA  and  the  PM  of  the  energy  detector  are  given  by  [23, 
Sec.  2.6.2] 

Vn  \  ,  f  M  r)n  \ 

2«»  +  <l )  '  “  7  ^ '2 

(19) 

where 

is  the  incomplete  gamma  function.  The  optimal  decision 
threshold  of  the  energy  detector  is  chosen  so  that  8*  =  (. 
Unless  otherwise  mentioned,  we  assume  that  M  =  10, 

<7^  0  =  a q  =  0  dB,  and  1  =  <j\  —  5  dB  for  all  channels 
n  =  1, . . . ,  N. 

Impact  of  Sensor  Operating  Characteristics:  Fig.  6  shows 
the  impact  of  sensor  operating  characteristics  on  the  secondary 
user’s  throughput  and  the  optimal  access  policy.  The  upper 
graph  plots  the  maximum  normalized  throughput  Vi(A(l))/T 
versus  the  PM  8.  The  optimal  transmission  probabilities 
(/*  (0),  f*  (1))  are  shown  in  the  middle  and  the  lower  graph, 
respectively.  We  can  see  that  the  maximum  throughput  is 
achieved  at  8*  =  (  =  0.05  and  the  transmission  probabili¬ 
ties  change  with  8  as  given  by  Theorem  2.  Interestingly,  the 
throughput  curve  is  concave  with  respect  to  8  in  the  “aggres¬ 
sive”  region  ( 8  <  Q  and  convex  in  the  “conservative”  region 
(8  >  Q.  The  performance  thus  decays  at  a  faster  rate  when  the 
sensor  operating  point  drifts  toward  the  “conservative”  region. 
This  suggests  that  miss-detections  are  more  harmful  to  the  OSA 
design  than  false  alarms. 

Impact  of  the  Number  of  Channel  Measurements:  In  this 
example,  we  study  the  tradeoff  between  the  spectrum  sensing 
time,  which  is  determined  by  the  number  M  of  channel  mea¬ 
surements  taken  by  the  spectrum  sensor,  and  the  transmission 


Fig.  7.  The  impact  of  the  number  of  channel  measurements  on  the  performance 
of  the  optimal  OSA  design. 

time.  Taking  more  channel  measurements  can  improve  the  fi¬ 
delity  of  the  sensing  outcome  but  will  reduce  the  data  trans¬ 
mission  time  and  hence  the  number  of  transmitted  bits.  We  are 
thus  motivated  to  study  the  throughput  of  the  secondary  user 
as  a  function  of  M  for  different  maximum  allowable  proba¬ 
bilities  of  collision  (.  We  assume  that  each  channel  measure¬ 
ment  takes  c  —  5%  of  a  slot  time.  The  transmission  time  is  thus 
given  by  1  —  Me  =  1  —  0.05M.  Assuming  that  the  number 
of  bits  that  can  be  transmitted  by  the  secondary  user  is  pro¬ 
portional  to  both  the  channel  bandwidth  and  the  transmission 
time,  we  modify  the  immediate  reward  (4)  of  the  POMDP  to 
B{t)  =  (1  -  Mc)Ka(t)Ba. 

Fig.  7  shows  that  the  throughput  of  the  secondary  user  in¬ 
creases  and  then  decreases  with  the  number  M  of  channel  mea¬ 
surements.  Note  that  the  PM  is  a  function  of  the  number  M 
of  channel  measurements  and  the  detection  threshold  rj*  of  the 
energy  detector  (as  seen  from  (19)).  When  the  PM  is  fixed  to 
be  8^  =  (  according  to  the  separation  principle,  the  detection 
threshold  p*  increases  with  M,  and  hence  the  PFA  e*  decreases 
with  M.  As  a  consequence,  when  M  is  small,  the  throughput 
of  the  secondary  user  is  limited  by  the  large  PFA.  On  the  other 
hand,  when  M  is  large,  the  PFA  is  reduced  at  the  expense  of 
less  transmission  time  in  each  slot,  which  also  leads  to  low 
throughput.  We  observe  that  the  optimal  number  M*  of  channel 
measurements  at  which  the  throughput  is  maximized  decreases 
with  the  maximum  allowable  collision  probability  (.  The  reason 
behind  this  observation  is  that  the  PM  8*  increases  with  (  and 
hence  fewer  measurements  are  required  to  achieve  the  same  PFA 
(as  seen  from  (19)). 

Impact  of  Mismatched  Markov  Model:  In  this  example,  we 
study  the  impact  of  mismatched  Markovian  models  on  the  OSA 
performance.  We  assume  that  the  true  transition  probabilities 
are  given  by  a  and  0.  The  secondary  user  employs  the  optimal 
OSA  design  based  on  inaccurate  transition  probabilities  a'  and 
0 .  In  the  upper  half  of  Fig.  8,  we  plot  the  relative  throughput 
loss  as  a  function  of  the  relative  estimation  error  4>  in  transition 
probabilities,  where  4'  =  °Ln~°Lri  —  (n  =  1,2, 3).  Note 

that  when  4>  =  0,  the  secondary  user  has  perfect  knowledge 
of  the  transition  probabilities  and  hence  achieves  the  maximum 
throughput.  Inaccurate  knowledge  can  cause  performance  loss. 
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Fig.  8.  The  impact  of  mismatched  Markov  model  on  the  performance  of  the 
optimal  OSA  strategy. 


We  observe  that  the  relative  throughput  loss  is  below  4%  even 
when  the  relative  error  is  up  to  20%.  In  the  lower  graph,  we 
examine  the  probability  of  collision  perceived  by  the  primary 
network.  We  see  that  the  probability  of  collision  is  not  affected 
by  inaccurate  transition  probabilities,  which  confirms  PI. 2. 

V.  OSA  With  Multichannel  Sensing 

In  this  section,  we  address  the  joint  design  of  OSA  in  the  case 
where  multiple  channels  can  be  sensed  and  accessed  simultane¬ 
ously  in  each  slot  (L  >  1).  We  focus  on  the  extension  of  the 
separation  principle  developed  in  Section  IV. 

A.  Optimal  Joint  Design 

Within  the  POMDP  framework  presented  in  Section  III,  we 
first  describe  the  three  basic  components  of  OSA  with  multi¬ 
channel  sensing  and  then  derive  the  optimality  equation. 

Spectrum  Sensor :  Suppose  that  a  set  A(t)  C  {1, . . . ,  N} 
of  channels  is  chosen  in  slot  t ,  where  \A(t)\  =  L  >  1.  The 
spectrum  sensor  performs  a  2L-ary  hypothesis  test 

H1:SA(t)  =  [0,1,...,  1], 

n2L-1:SA(t)  =  [  0,0,...,  0],  (20) 

where  SA(t)  =  {*Sn(^)}nG^t(t)  £  {0, 1}L  denotes  the  occu¬ 
pancy  states  of  the  chosen  channels  A(t)  in  the  current  slot.  The 
a  priori  probabilities  of  these  hypotheses  can  be  learned  from 
the  observation  and  decision  history,  which  is  characterized  by 
the  belief  vector.  For  example,  given  current  belief  vector  A (t) 
and  chosen  channels  A(t),  the  a  priori  probability  of  Hq  in  this 
slot  is  given  by 

Pr{W0}  =  ^As(t)  H  l[Sn=1].  (21) 

s£S  neA(t) 

This  indicates  that  how  sensor  and  access  information  at  the 
MAC  layer  (captured  by  the  belief  A (£))  can  be  used  in  the  de¬ 
sign  of  the  spectrum  sensor  at  the  PHY  layer. 


Let  ©40  =  {©„(i)}nG^(t)  G  {0, 1}L  denote  the  sensing 
outcomes.  Sensing  errors  occur  if  the  spectrum  sensor  mistakes 
one  hypothesis  for  another,  i.e.,  0^4 (t)  ^  SA(t).  Since  there  are 
a  total  of  2l  hypotheses,  the  performance  of  the  spectrum  sensor 
can  be  specified  by  a  set  £{t)  of  2L(2L  —  1)  error  probabilities 

£(£)  =  {Pr{detect  Tii  \  TLj  is  true}  :  0  <i,j  <2L  — 

(22) 

The  optimal  design  of  the  spectrum  sensor  should  achieve 
a  tradeoff  among  these  2L(2L  —  1)  error  probabilities.  Let 
F^6L\A)  include  all  sets  of  achievable  error  probabilities.  A 
sensor  operating  policy  specifies,  in  each  slot  t ,  a  feasible  sensor 
operating  point  (i.e.,  a  set  of  achievable  error  probabilities) 
E(t)  £  f\$L\A(t))  based  on  the  current  belief  vector  A(t)  and 
the  chosen  channels  A(t). 

Sensing  and  Access  Policies:  At  the  beginning  of  each  slot  t , 
a  sensing  policy  specifies  a  set 

A(t)e  f\P  =  {Ac{l,...,N},\A\  =  L} 

of  channels  to  be  sensed  based  on  the  current  belief  vector  A (t) . 
Based  on  A (£)  and  the  imperfect  sensing  outcomes  0^(f)  given 
by  the  spectrum  sensor,  an  access  policy  decides  whether  to  ac¬ 
cess  $^(f)  =  {&n(t)}neA(t)  £  {0, 1}L.  At  the  end  of  slot  f, 
the  receiver  acknowledges  every  successful  transmission.  The 
acknowledgments  are  denoted  by 

KA{t)  4  {Kn(t)}neA(t)  e{0,l}L 

where  Kn(t )  =  Sn(t)$n(t).  The  immediate  reward  R(t)  is 
given  by 

R(t)=  Kn(t)Bn.  (23) 

n£A(t) 

Optimality  Equation:  Similar  to  Section  III,  we  can  formu¬ 
late  the  optimal  design  of  OSA  with  multichannel  sensing  as 
a  constrained  POMDP.  We  can  also  show  that  Proposition  1 
holds,  i.e.,  it  is  sufficient  to  consider  deterministic  sensor  op¬ 
erating  and  sensing  policies  for  the  optimal  design  of  OSA  with 
multichannel  sensing.  Therefore,  in  each  slot,  the  secondary 
user  needs  to  make  the  following  decisions:  which  set  A{t)  £ 
of  channels  to  sense,  which  sensor  operating  point  £{t)  £ 
f\$L\A(t))  to  choose,  and  which  set  T(t)  =  {fn(@A)}  °f 
transmission  probabilities  to  use,  where 

/n(0,  t)  =  Pr{T>n(f)  =  1 1  BA(t)  =  6}  £  [0, 1], 

n  £  A(t),  0  £  {0, 1}^ 


is  the  probability  of  accessing  chosen  channel  n  given  belief 
vector  and  sensing  outcome  0A(t)  =  6.  The  composite  action 
space  is  denoted  by 

=  :  A  e  f\iL\£  e  ^(Aj^e  [o,i]L2t}. 

We  can  obtain  the  optimality  equation  and  the  design  con¬ 
straint  as 


U(A  (t)) 


max 

A={A,£,F}eP^(L) 


£A><‘)  £ 

sgs  kAe{o,i}L 


U(AL\kA\s ) 


x  [R(t)  +  Vt+1(T(A(t)  |  A,kA))] ,  1  <  t  <  T 


(24a) 
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Vt(A(T)) 

=a  E  uiL\kA\s)R(t)  (24b) 

1  ’  ’  1  ses  fc^c{o,i}L 

s.t.  Pn(t) 

=  E  hsA\sS8A\0)lBA\sA(0A\sA)fn(B4)<C,yn,t 


0A?Ae{  o,i}L 


(24c) 


where  T  =  {/n(0)}  is  a  set  of  chosen  transmission 

probabilities, 

hsA\sn(8A  1 0  =  Pr {SA(t)  =  a A I  Sn(t)  =  i} 


is  the  conditional  distribution  of  channel  occupancy  states 
Sj\(t)  given  current  belief  vector  A (t), 

Iba\sa(0a  I  *  0  =  Pr{0^(i)  =  eA  I  SA(t)  =  SA}€£ 

is  the  error  probability  determined  by  the  current  sensor  op¬ 
erating  point,  and  the  conditional  distribution  U^\kj\\s)  of 
observations  can  be  calculated  as  shown  in  (25)  at  the 

bottom  of  the  page.  The  updated  belief  vector  T (A (t)  \  A ,  kjf) 
can  be  obtained  by  using  (25)  and  (10). 

In  principle,  the  optimal  decisions  {^4*,  £*,  J7*}  in  each  slot 
can  be  obtained  by  solving  (24)  recursively.  However,  without 
any  structural  results  on  this  constrained  POMDP,(24)  is  compu¬ 
tationally  prohibitive.  A  natural  question  here  is  whether  there 
exists  a  separation  principle  similar  to  Theorem  1  that  can  be 
used  to  simplify  the  optimal  design  of  OSA  with  multichannel 
sensing. 

B.  Separation  Principle 

A  general  separation  principle  does  not  exist  for  the  joint  de¬ 
sign  of  OSA  with  multichannel  sensing.  We  show  that  under 
certain  conditions,  the  separation  principle  established  for  the 
single-channel  sensing  case  can  be  applied  in  the  multichannel 
sensing  scenarios. 

Theorem  3:  When  the  spectrum  sensor  and  the  access  policy 
are  designed  independently  across  channels,  the  separation  prin¬ 
ciple  developed  in  Theorem  1  is  valid  for  optimal  OSA  design 
with  multichannel  sensing.  In  this  case,  the  optimal  spectrum 
sensor  adopts  the  optimal  NP  detector  with  PM  equal  to  (  and 
detects  the  occupancy  of  a  chosen  channel  by  using  the  mea¬ 
surements  from  this  channel,  and  the  optimal  access  decision 
on  a  chosen  channel  is  to  trust  the  sensing  outcome  from  this 
channel.  The  optimal  sensing  policy  can  be  obtained  by  solving 
an  unconstrained  POMDP. 

Proof:  The  proof  is  built  upon  that  of  Theorem  1.  See 
Appendix  E.  Mil 


We  emphasize  that  the  extension  of  the  separation  principle 
to  multichannel  sensing  scenarios  is  based  on  the  condition  that 
the  spectrum  sensor  and  the  access  policy  are  designed  inde¬ 
pendently  across  channels.  Specifically,  we  assume  that  the  oc¬ 
cupancy  of  a  channel  is  detected  independently  of  the  mea¬ 
surements  taken  from  other  channels  and  the  access  decision 
on  a  channel  is  made  independently  of  the  sensing  outcomes 
from  other  channels.  Intuitively,  in  this  case,  the  design  of  spec¬ 
trum  sensor  and  access  policy  for  the  multichannel  L  >  1 
sensing  case  can  be  treated  as  L  independent  design  problems, 
one  for  each  chosen  channel.  Hence,  the  optimal  design  for  the 
single-channel  case  can  be  extended  to  L  >  1. 

Theorem  3  provides  sufficient  conditions  under  which  the  de¬ 
sign  given  by  the  separation  principle  (referred  to  as  the  SP  ap¬ 
proach  for  simplicity)  is  optimal.  In  Proposition  3,  we  show  that 
the  SP  approach  is  locally  optimal  (i.e.,  maximizes  the  instan¬ 
taneous  throughput)  under  certain  relaxed  conditions. 

Proposition  3:  Suppose  that  the  spectrum  sensor  is  designed 
independently  across  channels  while  the  access  policy  jointly 
exploits  the  sensing  outcomes  from  all  channels.  The  SP  ap¬ 
proach  is  locally  optimal  when  channels  evolve  independently. 

Proof:  See  Appendix  F.  I  I  I  I 

It  may  seem  plausible  that  the  SP  approach  is  (globally)  op¬ 
timal  when  channels  evolve  independently  since  in  this  case  the 
sensing  outcomes  are  independent  across  channels  and  inde¬ 
pendent  access  decisions  seem  to  suffice.  Interestingly,  counter 
examples  can  be  constructed  to  show  that  introducing  correla¬ 
tion  among  access  decisions  across  channels  can  improve  the 
overall  throughput.  The  rationale  behind  this  is  that  the  joint  ac¬ 
cess  design  enables  the  secondary  user  to  trade  the  immediate 
access  to  “bad”  channels  (e.g.,  channels  with  small  bandwidth) 
for  information  on  the  occupancy  states  of  “good”  channels, 
leading  to  potentially  more  rewarding  future  decisions.  Specifi¬ 
cally,  as  noted  in  Section  IV-B,  the  secondary  user  cannot  distin¬ 
guish  a  busy  channel  Sn(t)  =  0  from  the  decision  of  no  access 
$n{t)  —  0  when  observing  Kn(t )  =  0.  However,  if  the  access 
decision  <bm(£)  on  channel  m  n  is  correlated  with  <bn(£), 
then  we  can  infer  the  occupancy  state  of  channel  n  from  both 
Km(t)  and  Kn(t).  That  is,  by  sacrificing  the  reward  that  can 
be  obtained  in  channel  m  with  small  bandwidth,  we  can  obtain 
more  information  on  the  occupancy  state  of  channel  n. 

C.  Heuristic  Approaches  to  Exploiting  Channel  Correlation 

While  simplifying  the  design  of  OSA  with  multichannel 
sensing,  the  condition  that  the  spectrum  sensor  and  the  access 
policy  are  designed  independently  across  channels  can  cause 
throughput  degradation  since  the  correlation  among  channel 
occupancies  is  ignored.  We  propose  two  heuristic  approaches 


U(AL\kA\s)  4  pr  {KA(t)  =  kA  |  S(t )  =  *} 

=  E  1&a\sa(0a  I  8a)  II  Pr {Kn(t)  =  kn  I  &A(t)  =  0A,  SA(t)  =  8 A) 

oAe{o,i}L  neA 

=  E,  1<S>a\SA(0a  I  8a)  U  [knsnfn(0. 4)  +  (1  -  kn)(  1  -  Snfn(dA))\. 


(25) 
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to  exploit  the  channel  correlation:  one  at  the  PHY  layer  and  the 
other  at  the  MAC  layer. 

1 )  Exploiting  Channel  Correlation  at  the  PHY  Layer:  When 
the  occupancy  states  are  correlated  across  channels,  we  have 
correlated  channel  measurements  at  the  PHY  layer.  Hence,  the 
measurements  at  all  chosen  channels  should  be  jointly  exploited 
in  spectrum  opportunity  identification.  With  this  in  mind,  we 
propose  a  heuristic  design  of  the  spectrum  sensor:  it  performs  L 
binary  hypothesis  tests,  one  for  each  chosen  channel,  by  using 
all  channel  measurements  and  adopting  the  optimal  NP  detector 
with  PM  equal  to  (.  We  point  out  that,  differently  from  the  SP 
sensor,  the  proposed  spectrum  sensor  performs  L  composite  hy¬ 
pothesis  tests  since  it  uses  all  channel  measurements  and  the 
occupancy  states  of  other  channels  are  unknown  in  each  hy¬ 
pothesis  test.  Hence,  the  structure  of  the  optimal  NP  detector 
adopted  by  this  heuristic  sensor  relies  on  the  joint  distribution  of 
the  channel  occupancy  states,  which  is  given  by  the  belief  vector 
(see  Section  V-D  for  an  example).  That  is,  the  spectrum  sensor 
design  is  affected  by  the  observation  and  decision  history  and 
thus  varies  with  time.  As  illustrated  in  Fig.  9,  the  performance 
of  this  spectrum  sensor  improves  over  time,  resulting  from  more 
informative  distribution  of  the  SOS  obtained  from  accumulating 
observations.  Note  that  the  design  of  this  spectrum  sensor  is 
much  simpler  than  the  2L-ary  hypothesis  test  given  in  (20). 

Based  on  the  sensing  outcomes  given  by  this  sensor  that  ex¬ 
ploits  measurements  from  all  chosen  channels,  access  decisions 
are  made  independently  across  channels,  i.e.,  access  if  and  only 
if  a  channel  is  sensed  as  idle.  We  refer  this  approach  as  the  PHY 
layer  approach. 

Proposition  4:  Suppose  that  the  access  policy  is  designed  in¬ 
dependently  across  channels  while  the  spectrum  sensor  jointly 
exploits  the  measurements  taken  from  all  chosen  channels. 
The  PHY  layer  approach  is  locally  optimal.  When  channels 
evolve  independently,  the  PHY  layer  approach  reduces  to  the 
SP  approach. 

Proof:  See  Appendix  G.  II  I  I 

Note  that  the  PHY  layer  approach  is  locally  optimal  even 
when  channels  are  correlated. 

2)  Exploiting  Channel  Correlation  at  the  MAC  Layer: 
When  channel  occupancies  are  correlated,  so  are  the  sensing 
outcomes  given  by  the  spectrum  sensor.  Hence,  the  channel 
correlation  can  also  be  exploited  at  the  MAC  layer  by  making 
access  decisions  jointly  across  channels.  A  heuristic  MAC  layer 


approach  is  to  adopt  the  spectrum  sensor  of  the  SP  approach, 
i.e.,  to  detect  the  occupancy  state  of  a  channel  by  using  only  the 
measurements  of  this  channel,  and  then  choose  the  access  policy 
that  exploits  sensing  outcomes  from  all  chosen  channels  to 
maximize  the  instantaneous  throughput.  Specifically,  for  given 
chosen  channels  A  E  As  and  belief  vector  A (t)  in  slot  t,  we 
choose  transmission  probabilities  T  —  {fn{0  jCji}  £  [0, 1]L2 
as  shown  in  (26a)-(26c)  at  the  bottom  of  the  page,  where  the 
conditional  probability  hsA\sn(^A  \  i)  (i  =  0, 1)  of  the  current 
channel  occupancies  £U(f)  and  the  sensing  error  probability 
loA\sA  (0a  I  sa)  are  defined  below  (24). 

The  access  policy  given  in  (26)  can  be  obtained  via  linear  pro¬ 
gramming.  Proposition  5  shows  that  this  MAC  layer  approach  is 
equivalent  to  the  SP  approach  when  the  SOS  evolves  indepen¬ 
dently  across  channels.  This  agrees  with  our  intuition  that  when 
channels  are  independent,  so  are  the  sensing  outcomes  from  the 
chosen  channels.  Hence,  independent  access  decisions  perform 
as  well  as  the  joint  one  in  terms  of  instantaneous  throughput. 

Proposition  5:  Suppose  that  the  spectrum  sensor  is  designed 
independently  across  channels  while  the  access  policy  jointly 
exploits  the  sensing  outcomes  from  all  chosen  channels.  When 
channels  evolve  independently,  the  MAC  layer  approach  re¬ 
duces  to  the  SP  approach  and  hence  is  locally  optimal. 

Proof:  See  Appendix  F.  I  I  I  I 

D.  Numerical  Examples 

Next,  we  study  the  performance  of  the  SP,  the  PHY  layer,  and 
the  MAC  layer  approaches.  Note  that  these  three  approaches 
differ  in  the  spectrum  sensor  and  the  access  policy.  We  can  em¬ 
ploy  any  sensing  policy  to  compare  their  performance.  For  sim¬ 
plicity,  we  consider  a  myopic  sensing  policy  that  chooses  the  set 
A(t)  of  channels  to  maximize  the  expected  immediate  reward 
that  can  be  obtained  in  the  absence  of  sensing  errors,  i.e.,  for 
given  belief  vector  A (t)  in  slot  t 

A(t)  =  arg  max  V'  Bn Pr{Sn(£)  =  1}.  (27) 

Aef\{L^  n€A 

We  adopt  the  model  of  Gaussian  noise  and  Gaussian  pri¬ 
mary  signal  described  in  Section  IV-F.  In  this  case,  the  spectrum 
sensor  of  the  SP  approach  employs  an  energy  detector  given  in 
(18).  The  detection  threshold  pn  of  the  energy  detector  is  chosen 
so  that  the  PM  is  fixed  at  (. 


T  —  arg  max  E  [R(t)  |  A(t)]  (26a) 

Te[o,i]L2L 

=  arg  max  Y'  Bn  Pr{<bn(£)5n(£)  =  1} 

=  arg  max  V  Bn Pr {Sn(t)  =  1}  V  hs  |s„(s4  1 1)  1&a\sa^a  I  »a)  fn{0A)  (26b) 

eA,sfto,iV 

S.t.  Pn(t)=  Y  flSA\Sn(SA\Q)leA\SA(0A\8A)  fn(0A)  <(, 

9A,sAe{o,i}L 


Vn  £  A 


(26c) 
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Using  the  measurements  {¥n}neA  from  all  chosen  channels, 
the  sensor  employed  by  the  PHY  layer  approach  performs  a 
composite  hypothesis  test  for  each  chosen  channel  n 


H0(Sn(t)=l)  : 

Yn  ~  V(0m,  0^,cTm)) 

Wi(5„(i)=0)  : 

r’n  ~  Y(0m,  +  ^,0)L), 

y„{-  V(0M ,  +1  [Sm  (t)=0]  ^m,i  Vm  ) ,  Vm  G  -4(0\W  • 

(28) 


Note  that  the  distribution  of  the  measurements  under  each 
hypothesis  depends  on  the  distribution  of  the  current  channel 
occupancy  states  S^(t)  =  {Sn(t)}neA[,  which  is  given  by 
hs\sn  ($A  I  i)  (defined  below  (24))  and  can  be  calculated  from 
the  current  belief  vector  A (t).  In  this  case,  the  optimal  NP 
detector  for  (28)  is  given  by  a  likelihood  ratio  test  [23,  Sec.  2.5] 

hsA\Sn  (SA  |  0)  WmeA'P^y m\Sm  =  $m) 
Ss4£{0,1}l  hsA\Sn(sA  |  1)  WmeA^O^ rn\Sm  =  ^m) 

^uAn  (29) 

where  hSA\sn(sA  1 0)  =  0  when  sn  ^  0  andp(Yn\Sn  =  sn )  is 
the  pdf  of  independent  Gaussian  channel  measurements  Yn 


P(¥ n  —  «Sn) 

M  1  _ Yli 

=  -  P  2('<Trr,0  +  1[srr=0]<Trl,l)  _  (30) 

<= 1  .0  +  l[»n=0Fn,l) 


Note  that  when  channel  occupancies  are  independent,  the  above 
sensor  employed  by  the  PHY  layer  approach  is  equivalent  to 
that  of  the  SP  approach,  which  demonstrates  Proposition  4.  The 
PFA  and  the  PM  of  this  sensor  can  be  evaluated  via  simulation. 
In  each  slot,  the  detection  threshold  rn  is  chosen  according  to 
the  belief  vector  so  that  the  resulting  PM  is  fixed  at  (,  i.e.,  the 
design  of  the  spectrum  sensor  varies  with  time. 

As  proven  in  Propositions  3-5,  the  PHY  layer  and  the  MAC 
layer  approaches  are  equivalent  to  the  SP  approach  when  chan¬ 
nels  evolve  independently.  We  thus  compare  below  the  perfor¬ 
mance  of  these  three  approaches  in  correlated  channels.  Specif¬ 
ically,  we  consider  TV  =  4  correlated  channels,  each  with  band¬ 
width  Bn  —  1.  The  transition  probabilities  of  the  SOS  are  given 
by 

P([0111]  I  [0000])  =  0.6 
P([0000]  I  [0000])  =  0.4 
P([0000]  I  [0111])  =  P([0000]  I  [1011]) 

=  P([0000]  I  [1101]) 

=  P([0000]  I  [1110])  =  0.2 
and 


P([  1011]  |  [0111])  =  P([1101]  |  [1011]) 

=  P([1110]  I  [1101]) 

=  P([0111]  I  [1110])  =  0.8. 


The  maximum  allowable  probability  of  collision  is  assumed  to 
be  (  =  0.05.  In  each  slot,  L  =  3  channels  are  chosen.  The 


spectrum  sensor  takes  M  —  1  measurement  at  each  chosen 
channel,  and  the  noise  and  the  primary  signal  powers  are  given 
by  a^  0  =  0  dB  and  cr^ ?1  =  10  dB  for  all  n. 

Comparison  of  Sensor  Performance:  In  Fig.  9,  we  plot  the 
ROC  curves  (1  —  8n  versus  en)  of  the  SP  sensor  and  the  sensor 
employed  by  the  PHY  layer  approach.  Note  that  the  sensor  em¬ 
ployed  by  the  MAC  layer  approach  is  the  same  as  the  SP  sensor. 
We  see  that  the  sensor  of  the  PHY  approach  outperforms  that 
of  the  SP  sensor.  Specifically,  for  a  fixed  PM,  the  PFA  of  the 
sensor  employed  by  the  PHY  approach  is  smaller  than  that  of  the 
SP  sensor.  This  is  because  the  sensor  of  the  PHY  approach  ex¬ 
ploits  the  correlation  among  channel  measurements  in  detection 
while  the  SP  sensor  uses  measurements  from  a  single  channel. 
We  also  observe  that  the  ROC  curve  of  the  sensor  of  the  PHY  ap¬ 
proach  improves  over  time  while  that  of  the  SP  sensor  remains 
the  same.  This  observation  can  be  explained  by  comparing  the 
optimal  detectors  (18)  and  (29).  Clearly,  the  energy  detector  (18) 
used  by  the  SP  sensor  is  static  and  so  is  its  performance.  As 
seen  from  (29),  the  decision  variable  of  the  sensor  of  the  PHY 
approach  depends  on  the  conditional  distribution  h$A \sn  {$A  I  i) 
of  the  channel  occupancies,  which  varies  with  time  according  to 
the  belief  vector.  As  time  t  increases,  the  belief  vector  provides 
more  information  on  the  SOS  due  to  the  accumulating  observa¬ 
tions,  leading  to  improved  sensor  performance.  Fig.  9  demon¬ 
strates  that  the  performance  of  the  spectrum  sensor  can  be  im¬ 
proved  by  incorporating  the  sensing  and  access  decisions  at  the 
MAC  layer,  which  are  encoded  in  the  belief  vector. 

Comparison  of  Throughput  Performance:  In  Fig.  10,  we 
compare  the  throughput  of  these  three  approaches.  As  expected, 
the  SP  approach,  which  ignores  the  channel  correlation,  per¬ 
forms  the  worst.  By  jointly  exploiting  the  sensing  outcomes  in 
access  decision-making,  the  MAC  layer  approach  can  improve 
throughput  performance.  A  much  larger  performance  gain  is 
achieved  by  the  PHY  layer  approach  which  jointly  exploits  the 
channel  measurements  in  spectrum  opportunity  identification. 
We  can  thus  see  that  exploiting  channel  correlation  at  the 
PHY  layer  is  more  effective  than  that  at  the  MAC  layer.  In 
other  words,  independent  opportunity  identification  at  the  PHY 
layer  hurts  the  throughput  more  than  independent  access  deci¬ 
sion-making  at  the  MAC  layer.  This  agrees  with  our  intuition 
because  independent  opportunity  identification  makes  hard 
decisions  on  whether  the  channel  is  idle.  The  correlation  among 
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Fig.  10.  Comparison  of  normalized  throughput  (bit  units  per  slot). 


the  resulting  sensing  outcomes  is  less  informative  than  that 
in  the  original  channel  measurements,  leading  to  throughput 
degradation. 


VI.  Conclusion 

Unique  challenges  in  the  design  of  OS  A  networks  arise  from 
the  tension  between  the  secondary  users’  desire  for  performance 
and  the  primary  users’  need  for  protection.  Such  tension  dictates 
the  interaction  between  opportunity  identification  at  the  phys¬ 
ical  layer  and  opportunity  exploitation  at  the  MAC  layer,  and  a 
cross-layer  approach  is  necessary  to  achieve  optimality. 

In  this  paper,  we  have  developed  a  POMDP  framework  that 
captures  basic  components  and  design  tradeoffs  in  OSA.  We 
have  shown  that,  surprisingly,  there  exists  a  separation  principle 
in  the  optimal  joint  design  of  OSA  that  circumvents  the  curse 
of  dimensionality  in  general  POMDPs.  Being  able  to  obtain  the 
optimal  joint  design  in  closed  form  allows  us  to  characterize 
quantitatively  the  interaction  between  the  physical  and  MAC 
layers.  In  particular,  we  have  demonstrated  how  sensing  errors  at 
the  PHY  layer  affect  MAC  design  and  how  incorporating  MAC 
layer  information  into  physical  layer  leads  to  a  cognitive  spec¬ 
trum  sensor  whose  performance  improves  over  time  by  learning 
from  accumulating  observations. 

We  have  not  taken  into  account  the  interactions  among  sec¬ 
ondary  users.  The  design  of  multiuser  sensing  strategies  is  ad¬ 
dressed  in  [26],  where  perfect  sensing  is  assumed.  The  POMDP 
framework  has  also  been  extended  in  [27]  to  address  the  joint 
design  of  OSA  in  unslotted  primary  networks. 


Appendix  A 

Proof  of  Proposition  1 

We  first  prove  the  existence  of  a  deterministic  optimal 
sensor  operating  policy.  Suppose  that  channel  n  is  chosen 
in  the  current  slot.  Let  u  :  f\s(n)  — »  [0, 1]  be  an  arbitrary 
pdf  on  the  set  of  feasible  sensor  operating  points,  i.e., 


/(e  s)ef\s(n)  =  1-  We  can  compute  the  resulting 

PFA  en  and  the  PD  1  —  Sn  as 


en=E[e]=  /  €u(€,S)ded8,  (31a) 

J (et6)Gf\s  (n) 

l-Sn=E[l-S\  =  f  (1  -  6)u)(€,6)ded6.  (31b) 

J(e,6)ef\s(n) 

Since  0  <  e  <  1  —  <5  <  P^m  ax(e)  f°r  every  sensor  operating 
point  in  A$(n),  we  have 

0  <  en  <  1  —  Sn  <  [  pjfm ax(eU(e,  6)ded8.  (32) 

J(e,6)£f\s(n) 

(n) 

Since  the  best  ROC  curve  Pk  'is  concave,  we  have 

1/  5  IIlcxX 

e[4"L*(<0]  <  p{d, Lx(EM) 

and  hence 

o  <  £n  <  1  —  8n  <  ax(en)- 

That  is,  the  resulting  PFA  and  PM  (en,  8n)  of  any  randomized 
sensor  operating  policy  w  belongs  to  the  set  f\s  (n)  •  Therefore,  it 
is  sufficient  to  consider  deterministic  sensor  operating  policies. 

The  spectrum  sensor  and  the  access  policy  should  ensure  that 
the  collision  constraint  is  satisfied  no  matter  which  channel  is 
chosen.  Let  vn  denote  the  maximum  expected  remaining  reward 
when  channel  n  is  chosen  in  the  current  slot.  Then,  the  determin¬ 
istic  sensing  policy  that  chooses  channel  n*  =  arg  maxnG p\s  vn 
in  this  slot  is  optimal  since  the  maximum  expected  remaining 
reward  that  can  be  achieved  by  a  randomized  sensing  policy  is 
J2neef\s  vnKn)  <  •  where  /i :  As  — ►  [0, 1]  is  a  pmf  on  the 

set  f\s. 

Appendix  B 
Proof  of  Theorem  1 

The  proof  of  the  separation  principle  is  built  upon  the 
following  three  lemmas.  For  ease  of  presentation,  we  define 
Qt(A(t)  |  A)  as  the  maximum  expected  remaining  reward  that 
can  be  obtained  starting  from  slot  t  given  that  the  current  belief 
vector  is  A (t)  and  action  A  =  {a,  (ea,  8a),  (fa( 0),  /a(l))}  G  A 
is  taken  in  this  slot,  i.e., 

Qt(A(t)\A) 

1 

=  E  E  ^(*1*)  [kBa+Vt+1(T(A(t)  |  A,  fc))] .  (33) 

sGS  k= 0 

Let 

A  =  {a,  (ea,  8a),  (/a(0),  /a(l))}  G  A 

and 

^'=K(ia(/i(o),/:(i))}eA 

be  two  actions  with  the  same  channel  selection  but  different 
sensor  operating  points  and  transmission  probabilities. 
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Lemma  1:  The  value  function  given  in  (8)  is  convex  in 
the  belief  vector.  Specifically,  at  any  time  t ,  the  value  functions 
Vt(Ai(t))  and  Vt(A2 (t))  of  any  two  belief  vectors  Ai(£)  and 
A2  (t)  satisfy 

yt(-rAi(0  +  (1  -  t)A2(0)  <  +  (1  -  r)yt(A2(0), 

where  0  <  r  <  1 .  (34) 

Proof:  We  use  mathematical  induction.  From  the  value 
function  given  in  (8b),  we  can  see  that  Vr(A(t))  in  the  last  slot 
t  =  T  is  linear  and  hence  convex  in  the  belief  vector  A (t) .  Sup¬ 
pose  that  Vt(A (t))  is  convex  for  every  slot  t  >  to.  By  the  defi¬ 
nition  of  convex  functions,  we  can  show  that  the  maximum  re¬ 
maining  reward  Qt(A(t)  \  A)  under  an  action  A  £  A  is  convex. 
Since  the  maximum  of  a  set  of  convex  functions  is  convex,  the 
value  function  Vto  (A (£))  in  slot  t  =  to  is  convex  and  Lemma  1 
follows.  I  I  I  I 

Lemma  2:  If  acknowledgment  Ka(t )  =  1  is  observed  in 
a  slot  £,  then  the  expected  future  reward,  given  by  the  value 
function  Vt+i(T(A(t)  |  A,  1)),  is  independent  of  the  sensor 
operating  point  (ea,8a)  and  the  transmission  probabilities 
(/a(0),/a(l))  employed  in  the  current  slot.  That  is 

Vt+t(T( A(t)  |  A,  1))  =  Vm(T(A(t)  |  A',  1)).  (35) 


With  the  above  three  lemmas,  we  now  prove  the  separation 
principle.  First  notice  that  the  expected  immediate  reward 
E[R(t)  |  A (t)]  can  be  obtained  as 

E [R(t)  |  A(t)]  =BaJ2 X.(t)UA(l\s) 

se  § 

=  [eafa( 0)  +  (1  -  ea)fa(l)]Ba  E  A S(t)sa. 

s£§ 

(40) 

Since  BaJ2ses  is  a  constant  for  given  belief  vector 

A (t)  and  sensing  action  a,  the  expected  immediate  reward 
E [R(t)  |  A (t)]  increases  with  quantity  eafa( 0)  +  (1  -  ea)/a(l). 

Second,  we  note  that  the  sensor  operating  point  (ea,6a) 
and  the  transmission  probabilities  (/a(0), /a(l))  only  affect 
the  expected  remaining  reward  Qt(A(t)  \  A)  defined  in  (33) 
through  the  observation  probability  Ua(1\s)  =  sa[eafa( 0)  + 
(1  —  ea)fa(  1)].  Therefore,  if  we  can  show  that  Qt(A(t)  |  A) 
increases  with  the  quantity  eafa( 0)  +  (1  —  ea)fa(  1),  then  this 
will  prove  the  separation  principle. 

To  this  end,  we  consider  two  actions  A  and  A'  such  that 
ea/a(°)  +  (1  -  4)/a(!)  >  eafa( 0)  +  (1  ~  Ca)/a(  1)  ^  slot  t. 
Comparing  the  resulting  maximum  expected  remaining  rewards 
Qt(A(t)  |  A')  and  Qt(A(t)  |  A),  we  have 


Proof:  Applying  the  conditional  observation  probability 
C/a(1  \s)  given  in  (9)  to  (10),  we  obtain  the  updated  belief  vector 
h}{t  +  1)  =  T (A (t)  |  A,  1)  whose  element  A \{t  +  1)  is  given 
by 


+  1)  — 


(36) 


which  is  independent  of  the  sensor  operating  point  (ea,  8a)  and 
the  transmission  probabilities  (fa  (0) ,  fa  ( 1) ) .  Mil 


Lemma  3:  In  any  slot  t ,  the  future  rewards 
Vt+1(T(A(t)  |  A,  k))  and  Vt+1(T(A(t)  \  A',  k))  satisfy 
the  following  inequality: 


Qt(A(t)  |  A')  -  Qt(A(t)  |  A) 

=  xs(t)  {Ba  [UA,(l\s)  -  UA(l\s)} 

sGS 

1 

+  '£luA'(k\s)Vt+1(T(A(t)\Al,k)) 
k=0 

-UA(k\s)Vt+1(T(A(t)\A,km 

>  E  E  [UA'(k\s)Vt+1(T(A(t)  I  A',  k)) 

ses  k= o 

-UA(k\s)Vt+1(T(A(t)  |  A,  fc))] .  (41) 

Applying  Lemmas  2  and  3,  we  obtain  after  some  algebra 


Vt+1(T(A(t)\A,0)) 

<rVt+1(T(A(t)  |  A,  1))  +  (1  -  r)Vt+1(T(A(t)  |  A',  0))  (37) 
where  r  is  given  by 


Qt(A(t)  |  A1)  -  Qt(A(t)  |  A)  >  0  (42) 

which  proves  the  monotonicity  of  the  expected  remaining  re- 
wardQt(A(f)  |  A)  withea/a(0)  +  (l-ea)/a(l)  and  hence  com- 
pletes  the  proof  of  the  separation  principle. 


=  J2sSs^(t)[UAm-UA,(0\s)} 

T  Zses^WA(0\s)  ■  1  ; 

Proof:  Applying  the  conditional  observation  probability 
UA(k\s )  given  in  (9)  to  (10),  we  can  obtain  the  updated  be¬ 
lief  vectors  T (A (t)  \  A,  k )  and  T (A(t)  \  A',  fc).  After  some  al¬ 
gebras,  we  reach  the  following  equality: 

T (A (t)  |  A,  0)  =  tT (A (t)  |  A,  1)  +  (1  -  r )T (A (t)  \  A',  0) 

(39) 

where  r  is  given  by  (38).  Lemma  3  follows  from  the  convexity 
of  the  value  function  proven  in  Lemma  1 .  I  I  I  I 


Appendix  C 

Proof  of  Proposition  2 

When  8 a  =  1,  we  have  ea  =  0  and  the  objective  function 
eafa( 0)  +  (1  —  ea)fa(l)  given  in  (12a)  is  maximized  when 
/*(  1)  =  1.  When  8 a  £  [0, 1),  the  constraint  given  in  (12)  can 
be  written  as 


o  <  fa( 0)  < 


t-8„M  1) 


(43) 


1-Sa 

Applying  (43)  to  the  objective  function  in  (12a),  we  obtain  that 

6/7 


ea/a(0)  +  (l-€a)/a(l)  <  fa{  1) 


1  - 


1  ~8a 


+  TTT  (44) 
1  8a 
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where  the  equality  holds  when  fa( 0)  =  •  Since 

1  —  8 a  >  ea  (see  footnote  2),  the  right-hand  side  of  (44)  in¬ 
creases  with  /a(l).  Hence,  to  maximize  the  objective  function 
ea/a(0)  +  (1  -  ea)/a(l),  we  should  choose  the  largest  fa(  1) 
such  that  /a(0)  =  >  0  (see  (43)).  Therefore,  when 

&a  <  C  /a(!)  =  1  and>  correspondingly,  /*(0)  =  When 

>  C  /a(l)  =  £  and,  correspondingly,  /*(0)  =  0. 

Appendix  D 
Proof  of  Theorem  2 

Applying  the  optimal  transmission  probabilities 
(/*(0),  /*(  1))  given  in  Proposition  2  to  the  objective  function 
(12a),  we  obtain  that 

eafa{ 0)  +  (1  -  Oi)/a(l)  =  |  l-€a  ^  °  S  >  ( 

da  a“(45) 

Since  the  best  ROC  curve  is  concave  [23,  Sec.  2.2],  both 
■pj-  and  increase  with  ea  and  hence  decrease  with 

6a.  From  (45),  we  can  see  that  the  objective  function 
€afa(0)  +  (1  -  ea)fa(  1)  increases  with  <Sa  when  <5a  <  £, 
but  decreases  when  6a  >  (.  Hence,  the  maximum  is  achieved 
when  S*  =  (.  Correspondingly,  the  optimal  transmission 
probabilities  (/*(0),  /*(1))  are  given  by  (0, 1). 

Appendix  E 
Proof  of  Theorem  3 

Let  A&  =  W{(£n,«n)W*{(/n(0),/»{l))W}  e 
Pi<L>  denote  a  joint  composite  action  taken  in  a  slot  t  and 
An  =  {n,(en,6n),(fn(0),fn(l))}  E  A  denote  the  corre- 
sponding  actions  taken  on  each  individual  chosen  channel 
n  €  A.  When  the  spectrum  sensor  is  designed  independently 
across  channels,  we  can  write 

leA\ sa(0a  I s a)  =  Pr{*ML)  =  #4  I SA(t)  =  s A } 

=  R  Pr{0„(t)  =  6n  |  Sn(t)  =  s„} 

neA 

in  a  product  form  since  the  occupancy  of  a  channel  is  detected 
independently  of  the  measurements  at  other  chosen  channels. 
When  the  access  policy  is  designed  independently  across  chan¬ 
nels,  we  have  fn{0A)  =  /n(#n)  for  all  sensing  outcomes  0j\  £ 


{0, 1}L.  Therefore,  we  can  write  the  conditional  observation 
probability  U ^  (k^\s)  as  (46),  shown  at  the  bottom  of  the  page. 
Similarly,  after  some  algebras,  the  design  constraint  in  (24c)  can 
be  written  as  (47),  also  shown  at  the  bottom  of  the  page. 

Applying  (46)  to  (24),  we  can  see  that  the  sensor  operating 
point  (en,  Sn)  and  transmission  probabilities  (/n( 0),  fn(  1))  of 
a  chosen  channel  n  £  A  affect  the  maximum  remaining  reward 
only  through  UAn(l\s)  =  sn[enfn( 0)  +  (1  -  <„)/,,(  I )].  which 
is  independent  of  the  actions  {4m}m^\{n}  taken  on  the  other 
channels.  Moreover,  the  simplified  constraint  (47)  reveals  that 
the  collision  probability  of  a  channel  n  is  also  independent  of 
the  actions  taken  at  other  channels.  Therefore, 

the  design  of  the  sensor  operating  and  access  policies  can  be 
decoupled  across  channels.  Following  the  same  proof  as  given 
in  Appendix  B,  we  can  show  that  the  expected  remaining  reward 
increases  with  en/n(0)  +  (1  —  en)/n(l)  of  every  chosen  channel 
n  £  A. 

On  the  other  hand,  the  expected  immediate  reward 
E [72(f)  |  A (f)]  is  given  by 

E[R(t)\A(t)] 

=  Y,  =  1} 

neA 

=  YBn  Pr {Sn(t)  =  l}[e„/„(0)  +  (1  -  en)/n(  1)]  (48) 

neA 

which  also  increases  with  en/n(0)  +  (1  —  en)/n(l).  Therefore, 
the  separation  principle  developed  in  Theorem  1  holds  for  L  >  1. 

Appendix  F 

Proof  of  Propositions  3  and  5 

Let  A  £  A^L ^  denote  a  set  of  chosen  channels  and  A~  = 
*4\{n}  be  all  the  set  of  chosen  channels  excluding  n. 
Since  channels  evolve  independently,  we  have 
hs\sSs~  I  0)  =  hs  \Sn(s~  |1)>  where  s-  =  {sm}ma- 
and 

hS  | s-n  (s-  |  i)  =  PrlS^-  0 t)  =  s~  \  Sn(t)  =  i}. 

n 

Hence,  given  belief  vector  A (t)  and  chosen  channels  A  in  slot  t , 
the  myopic  (i.e.,  locally  optimal)  sensor  operating  point  (en,  6n) 
and  transmission  probabilities  T  —  {fn(0A)}  are  given  by 


uf\kA\s)  =  Y  II  Pr{0«W  =  °n  I  Suit)  =  sn}[knsnfn(8n)  +  (1  -  kn){  1  -  snfn(8n))\ 

^aG{0,1}l  neA 
1 

=  nz  Pr{0n(i)  =  9n  |  Sn  =  sn}[knsnfn(8n)  +  (1  -  kn)(l  -  snfn(8n ))] 
neA  en=o 

=  II  UaMs).  (46) 

neA 


Pn{t )  =  Y  Pr{©n(£)  =  Sn  I  Sn{t)  =  0 }fn(9n)  =  (1  -  Sn)fn(0)  +  Snfn(  1)  <  (, 
en=o 


Vn  €  A. 


(47) 
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{('»- Sn), X)  =  arg  max  E  [R(t)  |  A(t)] 

{en,6n)ef\6  ( n ) 

Te[o,i]L2L 


arg  max 

(en  ,3n)£ (n)  . 

^€[0,1  ]”L  U€A 


Y  BnPr{Sn(t)  =  1}Y  Pr{0«(O  =  °n  I  Su{t)  =  1  }ffl 


0n=O 


=  arg  max  Y  Bn  Pr {Sn(t)  =  l}[e„5o  +  (1  -  en)ffi 

( i=  —  f)  „  *  1  '"** 


(en  ,^rr)C 

fe[o,i]L2L 


n£A 


;.t.  Pn(t)  =  Y  pr{0»(9  =  On  I  Suit)  =  0 }gl 


e„=o 


=  (1  +  VneA 


(49a) 


(49b) 


gniOn)  =  Y  Eft  Pr{'S’^-=s  }  n  Br{@mit)  =  Om\Smit)  =  sm},  (50) 

e-efo,!}^-1  s- €{0,1}L_ 1  mEAn 


{£,{ifniO)Jnil))}nEA}  (51a) 

=  arg  max  Y  BnPr{S„it)  =  l}[Pr{0„(t)  =  1 1  Snit)  =  1}/„(1) 
neA 

fn(  0),/„(l)G[0,l] 

+  Pr  {©n(0  =  0 1  Suit)  =  l}/„(0)]  (51b) 

s.t.  Pn(t)  =  Pr{0„(O  =  1 1  Suit)  =  0}/„(l)  +  Pr{0„(O  =  0 1  5„(0  =  0}/n(0)  <  C,  Vn  e  .4,  (51c) 


Pr  {Quit)  =  0n  |  Snit)  =  s„} 


=  Y  Pr{©^-(£)  =  0  ,0n(O  =  IA4-(0  =  S  >  >Sn(fl  =  S„}  Pr  {S  A~  (t)  =  S  \  Sn(t)  =  Sn} 

6~,s~e{  OTp-1 


(52) 


(26)  as  shown  in  (49a)-(49b)  at  the  top  of  the  page,  where 
g(6n)  £  [0, 1]  is  defined  as  in  (50)  also  at  the  top  of  the  page, 
where  <T  =  { 0m}mGA -  .  We  see  from  (49)  that  the  myopic  ap¬ 
proach  should  maximize  engn(0)  4-  (1  —  en)gn{  1)  under  the 
constraint  (1  —  8n)gn{ 0)  +  8ngn(  1)  <  (  for  every  chosen 
channel  n  e  A,  leading  to  the  same  optimization  problem  as 
(12).  By  Theorem  2,  hn  =  C  and  (</n(0),  <)n(l))  =  (0,1) 
are  the  solution  to  (49).  That  is,  the  SP  sensor  is  locally  op¬ 
timal.  Furthermore,  since  (gn(0) , gn(l))  =  (0,1)  is  achieved 
by  choosing  fn(0~,0n)  =  l[6Y=i]  in  (50),  transmission  proba¬ 
bilities  fn(6 a)  =  @n  are  locally  optimal,  which  completes  the 
proof  of  Proposition  3. 

Proposition  5  follows  directly  from  the  fact  that  the  MAC 
layer  approach  employs  the  myopic  access  policy  and  the  SP 
sensor,  which  has  been  proven  to  be  locally  optimal. 

Appendix  G 

Proof  of  Proposition  4 

When  the  access  policy  is  designed  independently  across 
channels,  we  have  fn{@  a)  —  fn(Qn)  f°r  any  sensing  outcome 
04.(t)  =  Q a  from  chosen  channels  A.  Hence,  given  belief 
vector  A (t)  and  chosen  channels  A  in  slot  t,  the  myopic  spec¬ 
trum  sensor  £  and  access  decisions  {(fn(0).  fn(^))}neA  are 


given  by  (51a)-(51c)  at  the  top  of  the  page,  where  (52),  also  at 
the  top  of  the  page,  is  determined  by  the  sensor  operating  point 
£  £  f\(sL)  and  the  current  belief  vector  A (t)  (see  Appendix  F 
for  notation  definitions).  Since  (51)  has  the  same  form  as  (12), 
the  PHY  layer  approach  is  locally  optimal. 

Furthermore,  when  the  SOS  evolves  independently  across 
channels,  the  measurements  from  different  channels  are  inde¬ 
pendent.  Hence,  the  sensor  employed  by  the  PHY  layer  ap¬ 
proach  is  equivalent  to  the  SP  sensor. 
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